Page 1 of 1

Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Fri Jul 28, 2023 5:30 pm
by hatedark1
Dear fellows,

I am facing a time constraint when calculating the band structure of a material using the HSE06 functional on an HPC facility. The time limit per job is 7 days and I reached it the first time I tried, resulting in a killed job in electronic step 11 (see snippet of the last lines of stdout below).

[...]
the WAVECAR file was read successfully
initial charge from wavefunction
entering main loop
N E dE d eps ncg rms ort
gam= 0.000 g(H,U,f)= 0.798E+00 0.177E+00 0.236E-17 ort(H,U,f) = 0.000E+00 0.000E+00 0.000E+00
SDA: 1 -0.221003576234E+03 -0.22100E+03 -0.39012E+00 15232 0.975E+00 0.000E+00
gam= 0.382 g(H,U,f)= 0.220E+00 0.768E-01 0.810E-60 ort(H,U,f) = 0.304E+00 0.802E-01 0.573E-60
DMP: 2 -0.221282101018E+03 -0.27852E+00 -0.17737E+00 15232 0.297E+00 0.384E+00
gam= 0.382 g(H,U,f)= 0.600E-01 0.428E-01 0.676-168 ort(H,U,f) = 0.653E-01 0.649E-01 0.908-168
DMP: 3 -0.221403766823E+03 -0.12167E+00 -0.60993E-01 15232 0.103E+00 0.130E+00
gam= 0.382 g(H,U,f)= 0.320E-01 0.243E-01 0.338E-32 ort(H,U,f) = 0.128E-01 0.497E-01 0.700E-32
DMP: 4 -0.221450059456E+03 -0.46293E-01 -0.32079E-01 15232 0.563E-01 0.626E-01
gam= 0.382 g(H,U,f)= 0.141E-01 0.135E-01 0.458E-37 ort(H,U,f) = 0.140E-01 0.334E-01 0.127E-36
DMP: 5 -0.221477077149E+03 -0.27018E-01 -0.18294E-01 15232 0.277E-01 0.473E-01
gam= 0.382 g(H,U,f)= 0.568E-02 0.779E-02 0.000E+00 ort(H,U,f) = 0.735E-02 0.211E-01 0.000E+00
DMP: 6 -0.221492571805E+03 -0.15495E-01 -0.97377E-02 15232 0.135E-01 0.285E-01
gam= 0.382 g(H,U,f)= 0.284E-02 0.464E-02 0.188-124 ort(H,U,f) = 0.295E-02 0.133E-01 0.732-124
DMP: 7 -0.221500960146E+03 -0.83883E-02 -0.54818E-02 15232 0.748E-02 0.163E-01
gam= 0.382 g(H,U,f)= 0.138E-02 0.280E-02 0.116E-43 ort(H,U,f) = 0.140E-02 0.848E-02 0.508E-43
DMP: 8 -0.221505772571E+03 -0.48124E-02 -0.31794E-02 15232 0.417E-02 0.988E-02
gam= 0.382 g(H,U,f)= 0.569E-03 0.166E-02 0.324E-15 ort(H,U,f) = 0.653E-03 0.527E-02 0.151E-14
DMP: 9 -0.221508582782E+03 -0.28102E-02 -0.17965E-02 15232 0.223E-02 0.592E-02
gam= 0.382 g(H,U,f)= 0.200E-03 0.937E-03 0.682E-71 ort(H,U,f) = 0.237E-03 0.310E-02 0.315E-70
DMP: 10 -0.221510165123E+03 -0.15823E-02 -0.96542E-03 15232 0.114E-02 0.334E-02
gam= 0.382 g(H,U,f)= 0.675E-04 0.491E-03 0.302E-23 ort(H,U,f) = 0.656E-04 0.169E-02 0.136E-22
DMP: 11 -0.221511007259E+03 -0.84214E-03 -0.49241E-03 15232 0.559E-03 0.176E-02
I started the calculation again last week and had the idea to stop it using the STOPCAR file a little bit more than a day before the time limit would be reached and continue the calculation later. Yesterday I generated a STOPCAR file with the following content
LABORT = .TRUE.
to stop the calculation at the next electronic step. The calculation stopped today and the stdout file shows
[...]
reading WAVECAR
the WAVECAR file was read successfully
initial charge from wavefunction
entering main loop
N E dE d eps ncg rms ort
gam= 0.000 g(H,U,f)= 0.278E+03 0.996E+02 0.116E-46 ort(H,U,f) = 0.000E+00 0.000E+00 0.000E+00
SDA: 1 -0.470604565658E+02 -0.47060E+02 -0.15093E+03 15232 0.377E+03 0.000E+00
gam= 0.382 g(H,U,f)= 0.833E+02 0.278E+02 0.531-183 ort(H,U,f) = 0.134E+03 0.307E+02-0.105-182
DMP: 2 -0.155432281255E+03 -0.10837E+03 -0.69587E+02 15232 0.111E+03 0.165E+03
gam= 0.382 g(H,U,f)= 0.181E+02 0.164E+02 0.282-122 ort(H,U,f) = 0.380E+02 0.187E+02-0.251-122
DMP: 3 -0.201416887500E+03 -0.45985E+02 -0.22501E+02 15232 0.346E+02 0.568E+02
gam= 0.382 g(H,U,f)= 0.610E+01 0.385E+01 0.516-139 ort(H,U,f) = 0.474E+00 0.100E+02-0.225-139
DMP: 4 -0.214800266613E+03 -0.13383E+02 -0.55850E+01 15232 0.995E+01 0.105E+02
gam= 0.382 g(H,U,f)= 0.399E+01 0.176E+01 0.141E-55 ort(H,U,f) =-0.130E+01 0.369E+01 0.561E-56
DMP: 5 -0.218088999089E+03 -0.32887E+01 -0.26631E+01 15232 0.574E+01 0.239E+01
gam= 0.382 g(H,U,f)= 0.176E+01 0.672E+00 0.942E-18 ort(H,U,f) = 0.473E+00 0.192E+01 0.160E-17
DMP: 6 -0.219898347762E+03 -0.18093E+01 -0.13392E+01 15232 0.244E+01 0.239E+01
gam= 0.382 g(H,U,f)= 0.693E+00 0.306E+00 0.259E-12 ort(H,U,f) = 0.410E+00 0.866E+00 0.724E-12
DMP: 7 -0.220823103308E+03 -0.92476E+00 -0.59470E+00 15232 0.999E+00 0.128E+01
gam= 0.382 g(H,U,f)= 0.230E+00 0.132E+00 0.947E-22 ort(H,U,f) = 0.219E+00 0.405E+00 0.215E-21
DMP: 8 -0.221245480715E+03 -0.42238E+00 -0.24024E+00 15232 0.362E+00 0.625E+00
gam= 0.382 g(H,U,f)= 0.614E-01 0.583E-01 0.296E-15 ort(H,U,f) = 0.559E-01 0.176E+00 0.727E-15
DMP: 9 -0.221411860667E+03 -0.16638E+00 -0.83269E-01 15232 0.120E+00 0.232E+00
hard stop encountered! aborting job ...
soft stop encountered! aborting job ...
1 F= -.22141186E+03 E0= -.22141186E+03 d E =-.289231E-22
Start KPOINTS_OPT (optional k-point list driver)
k-point batch [1-119\150]
N E dE ncg
DAV: 1 0.207055104358E+05 -0.30297E+06 45696
However, the WAVECAR and CHG* files needed to continue the calculation were not generated. Did I do something wrong?

Best regards,
Lira.

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Mon Jul 31, 2023 6:56 pm
by pedro_melo
Dear Lira,

It seems strange that your job took 7 days to compute a band structure. Could you provide me with the input files (INCAR, POTCAR, POSCAR, KPOINTS) that you are using?

Kind regards,
Pedro Melo

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Thu Aug 03, 2023 2:37 pm
by hatedark1
Dear Pedro Melo,

The requested files are attached to this message. Thanks for looking into it. The system is composed of an ABC stacked bulk material with 21 atoms on the unit cell, this is probably why the hybrid band structure calculation takes so long. When I did the same calculation for a monolayer of this material, it took 11 days on my local cluster, which is older than the one I'm using now.

However, I still don't know what went wrong on trying to stop and continue the calculation.

Best regards,
Lira.

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Tue Aug 15, 2023 5:43 pm
by hatedark1
Dear fellows,

Is anyone able to assist me regarding the continuation of the calculations? Does the continuation of a calculation that has been stopped at a certain electronic step work?

Best regards,
Lira.

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Wed Aug 16, 2023 6:35 am
by alex
Dear Lira,

a) your jobscript let me guess that you are on AMD 128 core CPUs, but just one of them. and you are using all cores of it. Due to the very(!) limited memory bandwidth of this CPU this is usally a very bad idea. For plain DFT I'm normally taking just half of it. Assuming HSE takes a bit more memory, you are probably better of with less, but this is up to you for figuring out.
b) your smearing of sigma=0.01 is even below room temperature. Is this what you want? It will affect convergence drastically (in a bad way if it stays like that)
c) you are starting off with about 120 k-points. Your cell has small lattice constants a and b, ok. However, what does plain DFT say? Is it metallic so you really need a dense mesh?

Happy crunching

alex

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Wed Aug 30, 2023 9:46 pm
by hatedark1
Dear Alex,

Thanks for the reply.
your jobscript let me guess that you are on AMD 128 core CPUs, but just one of them. and you are using all cores of it. Due to the very(!) limited memory bandwidth of this CPU this is usally a very bad idea. For plain DFT I'm normally taking just half of it. Assuming HSE takes a bit more memory, you are probably better of with less, but this is up to you for figuring out.
I'll take your advice regarding the number of CPU cores, thank you.
your smearing of sigma=0.01 is even below room temperature. Is this what you want?
Could you please elaborate on this?
It will affect convergence drastically (in a bad way if it stays like that)
I assume I should increase sigma, then? What value do you suggest?
you are starting off with about 120 k-points. Your cell has small lattice constants a and b, ok. However, what does plain DFT say? Is it metallic so you really need a dense mesh?
PBE results show the system is a semiconductor. I'll try and reduce number of k-points.

Best regards,
Lira.

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Fri Sep 01, 2023 12:57 pm
by alex
Hello Lira,
your smearing of sigma=0.01 is even below room temperature. Is this what you want?
Could you please elaborate on this?
smearing puts temperature into electrons, so room temperature is about 0.03 eV of energy. Normally, convergence is better with higher smearing, but this might end up in unphysical results. So with your semiconductor you should be safe with about 0.1 to 0.2 eV smearing and far less dense k-point mesh for starters.

However, I don't know how this would help with your band structure simulation.

Good luck!

alex

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Thu Sep 07, 2023 1:45 pm
by hatedark1
Thank you for the help Alex.

I was able to complete the calculation using your suggestions. I'll try to improve the precision now adjusting the parameters as much as my time constraint permits.

Best regards,
Lira.

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

Posted: Wed Oct 04, 2023 2:03 pm
by alex
You are welcome, Lira.

The overall convergence of, e.g., the total energy, is one side of the medal, where the convergence of your desired quantity might be reached with computationally much(!) cheaper settings!

Good luck!

alex