Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 3
- Joined: Tue Mar 08, 2022 9:13 pm
Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Hi all,
My goal is to model the refractive index in Terahertz and infrared ranges for a few organic crystals, thus I would to calculate the hessian matrix and the dielectric function with VASP. So, I submitted a job with IBRION = 6 (I also have my setup parameters in the end). However the calculation kept being timeout due to the wall time limit (50hrs) of the supercomputer that I use, and I have been resubmitting it again and again with an existing WAVECAR for 4 times now. I am just wondering if this kind of calculation to get hessian matric and dielectric function indeed takes so long or I am not restarting it correctly. I also see a comment in ResearchGate forum saying that this type of calculation could not be restarted, is it the case? (Bogdanovski, Dimitri. (2020). Re: Restart VASP job with IBRION=6?. Retrieved from: https://www.researchgate.net/post/Resta ... n/download.)
Below are my calculation steps:
I first converted a cif file to a POSCAR file with VESTA by outputting the current unit cell. Then I performed 4 consecutive relaxations (IBRION = 2) until the external pressure is below 1kb. After that, I submitted the job to calculate the hessian matrix and dielectric function with k-point mesh of 3x3x3 and with the below setup:
ISYM = 0
PREC = Accurate
ENCUT = 400
ISIF = 7
IBRION = 6
EDIFFG = -0.0001
IALGO = 38
NFREE = 2
POTIM = 0.015
ISMEAR = 0;
SIGMA = 0.1
GGA = PS
LEPSILON = .TRUE.
NCORE = 2
NSIM = 4
LREAL = Auto
LOPTICS = .TRUE.
I would appreciate any feedbacks or suggestions, thanks.
My goal is to model the refractive index in Terahertz and infrared ranges for a few organic crystals, thus I would to calculate the hessian matrix and the dielectric function with VASP. So, I submitted a job with IBRION = 6 (I also have my setup parameters in the end). However the calculation kept being timeout due to the wall time limit (50hrs) of the supercomputer that I use, and I have been resubmitting it again and again with an existing WAVECAR for 4 times now. I am just wondering if this kind of calculation to get hessian matric and dielectric function indeed takes so long or I am not restarting it correctly. I also see a comment in ResearchGate forum saying that this type of calculation could not be restarted, is it the case? (Bogdanovski, Dimitri. (2020). Re: Restart VASP job with IBRION=6?. Retrieved from: https://www.researchgate.net/post/Resta ... n/download.)
Below are my calculation steps:
I first converted a cif file to a POSCAR file with VESTA by outputting the current unit cell. Then I performed 4 consecutive relaxations (IBRION = 2) until the external pressure is below 1kb. After that, I submitted the job to calculate the hessian matrix and dielectric function with k-point mesh of 3x3x3 and with the below setup:
ISYM = 0
PREC = Accurate
ENCUT = 400
ISIF = 7
IBRION = 6
EDIFFG = -0.0001
IALGO = 38
NFREE = 2
POTIM = 0.015
ISMEAR = 0;
SIGMA = 0.1
GGA = PS
LEPSILON = .TRUE.
NCORE = 2
NSIM = 4
LREAL = Auto
LOPTICS = .TRUE.
I would appreciate any feedbacks or suggestions, thanks.
-
- Administrator
- Posts: 282
- Joined: Mon Sep 24, 2018 9:39 am
Re: Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Dear enochho,
Please submit POSCAR, the incomplete OUTCAR and your job script.
If you run vasp_std or vasp_ncl, please also include KPOINTS file in your upload.
Please submit POSCAR, the incomplete OUTCAR and your job script.
If you run vasp_std or vasp_ncl, please also include KPOINTS file in your upload.
-
- Newbie
- Posts: 3
- Joined: Tue Mar 08, 2022 9:13 pm
Re: Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Thanks for your response. Here are the files for one of the jobs I tried to run. The jobs just got timeout again this morning and this is the sixth time I submitted the job (I first submitted the job on the 8th of Feb). I also attached the generated WAVECAR file because I noticed that the WAVECAR files is empty with 0kb, and I am guess it might also be the reason the job do not get restarted correctly. Thanks.
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 236
- Joined: Mon Apr 26, 2021 7:40 am
Re: Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Hello!
Sorry for the delayed answer! I had a look at your files and I think you have one major mistake in the job script which causes this surprisingly bad performance: you are not running VASP with MPI parallelization but only with OpenMP threading. This can be seen from the OUTCAR lines at the very beginning:
Also in your job script you specify:
Hence you set the number of OpenMP threads to 64 and then invoke VASP without mpirun or srun. So VASP uses only a single MPI rank and 64 OpenMP threads. This will be highly inefficient as VASP relies on MPI as major parallelization method. OpenMP can be optionally added on top in a hybrid parallelization setup as explained here.
For the moment I would suggest not to use any OpenMP threading and instead fix the MPI parallelization. I do not know how your HPC cluster is set up so I can only guess how this should be done in your job script, but maybe this will get you started:
Or, alternatively, just use srun instead of mpirun -np 64. The first lines of the OUTCAR files should then look like this:
If this works you can check the timings in the OUTCAR (search for the string LOOP and LOOP+) file and compare them to your previous attempts. There should be a major improvement! In any case please consult the documentation of your HPC center to check out the correct way of running MPI parallelized applications.
Hope this helps, all the best,
Andreas Singraber
Sorry for the delayed answer! I had a look at your files and I think you have one major mistake in the job script which causes this surprisingly bad performance: you are not running VASP with MPI parallelization but only with OpenMP threading. This can be seen from the OUTCAR lines at the very beginning:
Code: Select all
running 1 mpi-ranks, with 64 threads/rank
Code: Select all
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
...
/fslhome/enochho/fsl_groups/fslg_jsl_vasp/VASP_code/vasp.6.3.2/bin/vasp_std
For the moment I would suggest not to use any OpenMP threading and instead fix the MPI parallelization. I do not know how your HPC cluster is set up so I can only guess how this should be done in your job script, but maybe this will get you started:
Code: Select all
export OMP_NUM_THREADS=1
...
mpirun -np 64 /fslhome/enochho/fsl_groups/fslg_jsl_vasp/VASP_code/vasp.6.3.2/bin/vasp_std
Code: Select all
running 64 mpi-ranks, with 1 threads/rank
Hope this helps, all the best,
Andreas Singraber
-
- Newbie
- Posts: 3
- Joined: Tue Mar 08, 2022 9:13 pm
Re: Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
Thank you very much for your help! I have tested many calculations over the past months according to your advice, and it indeed improved the calculations! I got many calculations with k-point mesh of 3 (3 x 3 x 3) done.
And now I am attempting to run the same kind of calculations with regular k-point mesh (4 x 4 x 4). I have tried multiple settings based on the cluster I can access (Node: 136, 128Cores/Node, 512GB memory/node and wall time = 3 days):
1. 64 core, 1 nodes, 8G memory, 1 OMP Threads, 64 mpi
2. 128 core, 1 nodes, 8G memory, 2 OMP Threads, 64 mpi
3. 128 core, 4 nodes, 4G memory, 4 OMP Threads, 128 mpi
However, they all failed with the same issue (empty WAVECAR). I could keep increasing the number of node and OMP Threads, but I am not sure if that is helping. I am wondering in this case, how could I approach this or what else could consider?
And now I am attempting to run the same kind of calculations with regular k-point mesh (4 x 4 x 4). I have tried multiple settings based on the cluster I can access (Node: 136, 128Cores/Node, 512GB memory/node and wall time = 3 days):
1. 64 core, 1 nodes, 8G memory, 1 OMP Threads, 64 mpi
2. 128 core, 1 nodes, 8G memory, 2 OMP Threads, 64 mpi
3. 128 core, 4 nodes, 4G memory, 4 OMP Threads, 128 mpi
However, they all failed with the same issue (empty WAVECAR). I could keep increasing the number of node and OMP Threads, but I am not sure if that is helping. I am wondering in this case, how could I approach this or what else could consider?
-
- Global Moderator
- Posts: 503
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Continous timeout for Phonon Calculation (Hessian Matrix and Dielectric function)
It is correct that IBRION=6 calculations cannot be restarted.
These calculations work by making finite ionic displacements and then computing the forces, those are then used to compute the force constants by finite differences.
For a restart functionality, we would need to store the forces for each ionic step calculations, as well as which ionic displacements have been performed.
This is not implemented, but in our list because it is an often requested feature.
The reason you get an empty WAVECAR is that it would be written only at the end of all the ionic displacements and since your calculation reaches a timeout before that, then it is never written.
Anyway, the information in the WAVECAR is not sufficient to restart the calculation, as I mentioned above.
In these cases, we recommend using phonopy to generate POSCARs with ionic displacements, then run VASP to get the forces for each of these POSCARs and build the hessian matrix using phonopy.
https://phonopy.github.io/phonopy/
These calculations work by making finite ionic displacements and then computing the forces, those are then used to compute the force constants by finite differences.
For a restart functionality, we would need to store the forces for each ionic step calculations, as well as which ionic displacements have been performed.
This is not implemented, but in our list because it is an often requested feature.
The reason you get an empty WAVECAR is that it would be written only at the end of all the ionic displacements and since your calculation reaches a timeout before that, then it is never written.
Anyway, the information in the WAVECAR is not sufficient to restart the calculation, as I mentioned above.
In these cases, we recommend using phonopy to generate POSCARs with ionic displacements, then run VASP to get the forces for each of these POSCARs and build the hessian matrix using phonopy.
https://phonopy.github.io/phonopy/