Page 1 of 1

same input but different result/convergence on different nodes

Posted: Sun Jun 09, 2024 12:29 am
by yujia_teng
I repeated one calculation several times, on node with 48 cores and 64 cores. Their inputs are exactly same, the only difference is the node job was submitted to. It could only converge in two cases, with usual NPAR=4 on 48 cores and default NPAR on 64-core node but with more steps. I use NPAR=4 in my usual calculations, which gives NBANDS=56 in OUTCAR on both two nodes. I've also tried to comment NPAR to use default value, and they both don't converge.

I'm not sure why this happens. I've checked this forum but looks like it's not a problem related to NBANDS, and didn't find a solution to this. I've attached the 4 outputs here.

Re: same input but different result/convergence on different nodes

Posted: Mon Jun 10, 2024 8:24 am
by svijay
Dear yujia_teng,

It does indeed look like a few of the calculation take more than a few steps to converge (using ALGO=All wiki/index.php/ALGO might help here). It is difficult to say anything specific here without the input and output files (see here: wiki/index.php/Minimal_reproducible_exa ... 20possible.)

Sudarshan

Re: same input but different result/convergence on different nodes

Posted: Wed Jun 12, 2024 2:05 am
by yujia_teng
Dear admin,
I've attached the input file below. Only NPAR and #of nodes is different for those 4 calculations.

Re: same input but different result/convergence on different nodes

Posted: Wed Jun 12, 2024 7:11 am
by svijay
Could you please send all inputs, i.e. all INCAR with the corresponding NPAR and number of nodes and all OUTCAR files in accordance with the forum guidelines (forum/viewtopic.php?t=17928)? Thanks!

Sudarshan

Re: same input but different result/convergence on different nodes

Posted: Wed Jun 12, 2024 11:25 pm
by yujia_teng
Dear admin,
I've attached the file below as required. All of them are on one node.

Re: same input but different result/convergence on different nodes

Posted: Thu Jun 13, 2024 10:24 am
by svijay
Dear yujia_teng,

I am not sure I understand the issue, with 48 and 64 cores and fixing NBANDS to 56 it looks like you converge to the same energy. All calculations that you sent me seem to reach the required EDIFF. Is there a specific error you are facing beyond this?

Sudarshan

Re: same input but different result/convergence on different nodes

Posted: Thu Jun 13, 2024 1:12 pm
by yujia_teng
Dear admin,
Only 2 of 4 get convergence. With 48 and 64 cores same NBANDS (I didn't fix it), 48 cores one converges while 64 cores one doesn't. The EDIFF is 1E-08. 64 core one only reaches 1E-06 at maximum electronic step. So the issue is different number of cores gives different convergence result, one converged and one does not.

Same for default NPAR, 64 cores converges while 48 does not.

Re: same input but different result/convergence on different nodes

Posted: Fri Jun 14, 2024 8:44 am
by svijay
I am still not sure I understand. For the files that you attached it looks like all calculations reached electronic convergence and EDIFF = 0.1E-07 (grep for EDIFF on all OUTCARs). Are there some other files you are referring to?

Sudarshan

Re: same input but different result/convergence on different nodes

Posted: Fri Jun 14, 2024 1:36 pm
by yujia_teng
Dear admin,
Only 2 of the 4 calculations get convergence... Look at the output.log file, not other files since this is the place where we can see the problem. Only two of them reached 1E-08 at the end. The other two only reaches 1E-06. They couldn't get convergence within default electronic steps.

Re: same input but different result/convergence on different nodes

Posted: Fri Jun 14, 2024 3:25 pm
by svijay
Yes indeed I see it now. There are a couple of things to try here: (1) does the calculation end up converging with an increased NELM? (2) If not, since it is very near convergence anyway, have you tried another ALGO (say, ALGO=All)

Sudarshan

Re: same input but different result/convergence on different nodes

Posted: Sat Jun 15, 2024 12:05 am
by yujia_teng
Dear admin,
I didn't try that. I believe those not converged result can get convergence with increased NELM, tuning mixing parameter or using ALGO = All.

But the main point here is that, with exact same 4 input files and only # of cores different, why the convergence situation is different? Is that because how parallelization is implemented in VASP would just affect this?

Re: same input but different result/convergence on different nodes

Posted: Mon Jun 17, 2024 2:26 pm
by svijay
Yes, in this context it only makes sense to me to compare converged calculations. Recall that NBANDS is altered due to the parallelization settings (wiki/index.php/NBANDS) and so comparison for a converged (both in terms of parameters and electronically) calculation is required.

Re: same input but different result/convergence on different nodes

Posted: Mon Jun 17, 2024 4:11 pm
by yujia_teng
Dear admin,
I'm still confused here. Why it only makes senses to compare converged calculations? Just look at the 2 calculations with same NBANDS, which is 56 (I didn't set that, it's the default value generated by code). They have exact same input and same # of NBANDS. The only difference is # of core used, one is 48 and one is 64. But their output is different, one get convergence and one does not. Why this could happen? From vaspwiki, it looks like same NBANDS should give same result, but it's not in this case.