Is this memory problem or something else ..?

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
askhetan
Jr. Member
Jr. Member
Posts: 81
Joined: Wed Sep 28, 2011 4:15 pm
License Nr.: 5-1441
Location: Germany

Is this memory problem or something else ..?

#1 Post by askhetan » Mon May 16, 2016 1:14 pm

Hello all,
I am running a GW calculation on a slab with kpoints 7x6x1 and the POSCAR as:
-------------------------------------------------------------------------------------------------------------------
Li24 O24
1.00000000000000
6.3607230185999999 0.0000000000000000 0.0000000000000000
0.0000000000000000 7.7035222053999997 0.0000000000000000
0.0000000000000000 0.0000000000000000 22.0000000000000000
Li O
24 24
Selective dynamics
Direct
0.2500000000550244 0.7499999999935127 0.0417315059090910 F F F
0.7500000000078586 0.7499999999935127 0.0417315059090910 F F F
0.2500000000550244 0.0000000000000000 0.1251942677727271 F F F
0.7500000000078586 0.0000000000000000 0.1251942677727271 F F F
0.2500000000550244 0.5000000000389448 0.1251942677727271 F F F
0.7500000000078586 0.5000000000389448 0.1251942677727271 F F F
0.4999999999528342 0.7499999999935127 0.1669257699999989 F F F
0.0000000000000000 0.7499999999935127 0.1669257699999989 F F F
0.2500000980048540 0.2500000167121783 0.2052688645769223 T T T
0.7500001022620424 0.2500000194902157 0.2052688661991127 T T T
0.9999999831027253 0.0027676309394735 0.2462410069040288 T T T
0.4999999909262058 0.0027676283595071 0.2462410084879991 T T T
0.9999999896075167 0.4972323759992250 0.2462411043113875 T T T
0.4999999815450380 0.4972323738417330 0.2462411026440279 T T T
0.2500000410409200 0.7499999901229160 0.2874166347982197 T T T
0.7500000420981721 0.7499999885483817 0.2874166357634280 T T T
0.4999999384522340 0.2500000067084471 0.3264903720489727 T T T
0.9999999405782631 0.2500000010045227 0.3264903711083491 T T T
0.2499998593155297 0.9937402371787556 0.3626863213947971 T T T
0.7499998582454452 0.9937402434208664 0.3626863189092759 T T T
0.2499998194365745 0.5062597140084932 0.3626864592192405 T T T
0.7499998218586796 0.5062597186970592 0.3626864627341675 T T T
0.2500005626001496 0.2499995198580081 0.4389166628746324 T T T
0.7500005627464361 0.2499995209723096 0.4389166662839799 T T T
0.4999999999528342 0.8506900071510515 0.0834627581363705 F F F
0.0000000000000000 0.8506900071510515 0.0834627581363705 F F F
0.4999999999528342 0.6493099928359669 0.0834627581363705 F F F
0.0000000000000000 0.6493099928359669 0.0834627581363705 F F F
0.4999999999528342 0.3506900072419157 0.1669257699999989 F F F
0.0000000000000000 0.3506900072419157 0.1669257699999989 F F F
0.4999999999528342 0.1493099927970221 0.1669257699999989 F F F
0.0000000000000000 0.1493099927970221 0.1669257699999989 F F F
0.2499999210757338 0.8541036604356407 0.2044632896614758 T T T
0.7499999183219401 0.8541036767916808 0.2044632890318212 T T T
0.2500000058068181 0.6458963384943885 0.2044632928234833 T T T
0.7500000079276177 0.6458963550621775 0.2044632934887858 T T T
0.2499999271126256 0.3539066321619018 0.2869552670766993 T T T
0.7499999277660763 0.3539066663930086 0.2869552674341023 T T T
0.2499999751363404 0.1460933376666347 0.2869552552217272 T T T
0.7499999765354630 0.1460933715138495 0.2869552555206454 T T T
0.4999999794324523 0.8536104715263875 0.3287724576926365 T T T
0.9999999840773839 0.8536104758010623 0.3287724586312137 T T T
0.4999999300647673 0.6463895117591960 0.3287724665739020 T T T
0.9999999276011309 0.6463895160164554 0.3287724672324472 T T T
0.5000002622113087 0.3547397610432625 0.4080437671185635 T T T
0.0000002603261606 0.3547397665217744 0.4080437670026242 T T T
0.5000003221861320 0.1452601102673654 0.4080437404683650 T T T
0.0000003194025666 0.1452601155010242 0.4080437398718999 T T T
-------------------------------------------------------------------------------------------------------------------

As you can see, I have a slab system with 24(Li2O2) = 216 electrons (108 occupied bands). In my GW part I use the INCAR as:
-------------------------------------------------------------------------------------------------------------------
ALGO = SCGW0
ENCUTGW = 150
NOMEGA = 72
ISMEAR = -5
ISPIN = 2
SIGMA = 0.01
LREAL = .FALSE.
NELM = 2
LORBIT = 11
PRECFOCK = FAST
LWANNIER90=.TRUE.
LPEAD = .TRUE.
MAXMEM = 25500
KPAR = 2
NBANDS = 240
-------------------------------------------------------------------------------------------------------------------

The code always reaches the point where in the stdlog it shows in the end:
-------------------------------------------------------------------------------------------------------------------
energies w=
0.00 0.00 0.45 0.00 0.90 0.00 1.34 0.00 1.79 0.00
2.23 0.00 2.67 0.00 3.10 0.00 3.54 0.00 3.96 0.00
4.38 0.00 4.80 0.00 5.21 0.00 5.62 0.00 6.03 0.00
6.42 0.00 6.82 0.00 7.21 0.00 7.60 0.00 7.98 0.00
8.36 0.00 8.74 0.00 9.11 0.00 9.49 0.00 9.86 0.00
10.23 0.00 10.60 0.00 10.97 0.00 11.35 0.00 11.72 0.00
12.10 0.00 12.48 0.00 12.87 0.00 13.26 0.00 13.65 0.00
14.06 0.00 14.47 0.00 14.88 0.00 15.31 0.00 15.75 0.00
16.20 0.00 16.66 0.00 17.14 0.00 17.64 0.00 18.15 0.00
18.69 0.00 19.25 0.00 19.83 0.00 20.44 0.00 21.09 0.00
21.77 0.00 22.50 0.00 23.27 0.00 24.09 0.00 24.97 0.00
25.92 0.00 26.95 0.00 28.06 0.00 29.28 0.00 30.62 0.00
32.09 0.00 33.73 0.00 35.55 0.00 37.61 0.00 39.95 0.00
42.64 0.00 45.74 0.00 49.38 0.00 53.70 0.00 58.91 0.00
65.32 0.00 73.40 0.00
responsefunction array rank= 4560
LDA part: xc-table for Pade appr. of Perdew
allocating 1 responsefunctions rank= 4560
shmem allocating 36 responsefunctions rank= 4560
response function shared by NCSHMEM nodes 1
Doing 1 frequencies on each core in blocks of 36
NQ= 1 0.0000 0.0000 0.0000,
|.........|.........
-------------------------------------------------------------------------------------------------------------------

and after that it crashes giving a segmentation fault error that ALWAYS looks like :
-------------------------------------------------------------------------------------------------------------------
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp_std 00000000015168B5 Unknown Unknown Unknown
libpthread.so.0 00002AF0B878D100 Unknown Unknown Unknown
-------------------------------------------------------------------------------------------------------------------

When I check my memory requirements, it is not more than 13 GB per core where has I have made 25 GB available per core. I was hoping this much memory would be sufficient but apparently its not. There are several posts related to exactly this problem on the vasp forums without any solution. Can someone please suggest where to look for the source for the problem ? could it be the installation? In some other examples cases I noticed that the memory distribution across cores is quite horrendous (for eg. 12-13 GB on 22 out of 24 cores of a given node and 25-27 GB on the other 2 cores of that same node,w hcih probably leads to the crash) - could this be leading to crashes ?

support_vasp
Global Moderator
Global Moderator
Posts: 1817
Joined: Mon Nov 18, 2019 11:00 am

Re: Is this memory problem or something else ..?

#2 Post by support_vasp » Thu Sep 12, 2024 6:58 am

Hi,

We're sorry that we didn’t answer your question. This does not live up to the quality of support that we aim to provide. The team has since expanded. If we can still help with your problem, please ask again in a new post, linking to this one, and we will answer as quickly as possible.

Best wishes,

VASP


Locked