same input but different result/convergence on different nodes

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Moderator, Global Moderator

Post Reply
Message
Author
yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

same input but different result/convergence on different nodes

#1 Post by yujia_teng » Sun Jun 09, 2024 12:29 am

I repeated one calculation several times, on node with 48 cores and 64 cores. Their inputs are exactly same, the only difference is the node job was submitted to. It could only converge in two cases, with usual NPAR=4 on 48 cores and default NPAR on 64-core node but with more steps. I use NPAR=4 in my usual calculations, which gives NBANDS=56 in OUTCAR on both two nodes. I've also tried to comment NPAR to use default value, and they both don't converge.

I'm not sure why this happens. I've checked this forum but looks like it's not a problem related to NBANDS, and didn't find a solution to this. I've attached the 4 outputs here.
You do not have the required permissions to view the files attached to this post.

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#2 Post by svijay » Mon Jun 10, 2024 8:24 am

Dear yujia_teng,

It does indeed look like a few of the calculation take more than a few steps to converge (using ALGO=All wiki/index.php/ALGO might help here). It is difficult to say anything specific here without the input and output files (see here: wiki/index.php/Minimal_reproducible_exa ... 20possible.)

Sudarshan

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#3 Post by yujia_teng » Wed Jun 12, 2024 2:05 am

Dear admin,
I've attached the input file below. Only NPAR and #of nodes is different for those 4 calculations.
You do not have the required permissions to view the files attached to this post.

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#4 Post by svijay » Wed Jun 12, 2024 7:11 am

Could you please send all inputs, i.e. all INCAR with the corresponding NPAR and number of nodes and all OUTCAR files in accordance with the forum guidelines (forum/viewtopic.php?t=17928)? Thanks!

Sudarshan

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#5 Post by yujia_teng » Wed Jun 12, 2024 11:25 pm

Dear admin,
I've attached the file below as required. All of them are on one node.
You do not have the required permissions to view the files attached to this post.

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#6 Post by svijay » Thu Jun 13, 2024 10:24 am

Dear yujia_teng,

I am not sure I understand the issue, with 48 and 64 cores and fixing NBANDS to 56 it looks like you converge to the same energy. All calculations that you sent me seem to reach the required EDIFF. Is there a specific error you are facing beyond this?

Sudarshan

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#7 Post by yujia_teng » Thu Jun 13, 2024 1:12 pm

Dear admin,
Only 2 of 4 get convergence. With 48 and 64 cores same NBANDS (I didn't fix it), 48 cores one converges while 64 cores one doesn't. The EDIFF is 1E-08. 64 core one only reaches 1E-06 at maximum electronic step. So the issue is different number of cores gives different convergence result, one converged and one does not.

Same for default NPAR, 64 cores converges while 48 does not.

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#8 Post by svijay » Fri Jun 14, 2024 8:44 am

I am still not sure I understand. For the files that you attached it looks like all calculations reached electronic convergence and EDIFF = 0.1E-07 (grep for EDIFF on all OUTCARs). Are there some other files you are referring to?

Sudarshan

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#9 Post by yujia_teng » Fri Jun 14, 2024 1:36 pm

Dear admin,
Only 2 of the 4 calculations get convergence... Look at the output.log file, not other files since this is the place where we can see the problem. Only two of them reached 1E-08 at the end. The other two only reaches 1E-06. They couldn't get convergence within default electronic steps.

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#10 Post by svijay » Fri Jun 14, 2024 3:25 pm

Yes indeed I see it now. There are a couple of things to try here: (1) does the calculation end up converging with an increased NELM? (2) If not, since it is very near convergence anyway, have you tried another ALGO (say, ALGO=All)

Sudarshan

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#11 Post by yujia_teng » Sat Jun 15, 2024 12:05 am

Dear admin,
I didn't try that. I believe those not converged result can get convergence with increased NELM, tuning mixing parameter or using ALGO = All.

But the main point here is that, with exact same 4 input files and only # of cores different, why the convergence situation is different? Is that because how parallelization is implemented in VASP would just affect this?

svijay
Global Moderator
Global Moderator
Posts: 73
Joined: Fri Aug 04, 2023 11:07 am

Re: same input but different result/convergence on different nodes

#12 Post by svijay » Mon Jun 17, 2024 2:26 pm

Yes, in this context it only makes sense to me to compare converged calculations. Recall that NBANDS is altered due to the parallelization settings (wiki/index.php/NBANDS) and so comparison for a converged (both in terms of parameters and electronically) calculation is required.

yujia_teng
Newbie
Newbie
Posts: 10
Joined: Thu May 25, 2023 6:24 pm

Re: same input but different result/convergence on different nodes

#13 Post by yujia_teng » Mon Jun 17, 2024 4:11 pm

Dear admin,
I'm still confused here. Why it only makes senses to compare converged calculations? Just look at the 2 calculations with same NBANDS, which is 56 (I didn't set that, it's the default value generated by code). They have exact same input and same # of NBANDS. The only difference is # of core used, one is 48 and one is 64. But their output is different, one get convergence and one does not. Why this could happen? From vaspwiki, it looks like same NBANDS should give same result, but it's not in this case.

Post Reply