Page 1 of 1

Performance loss due to context switching

Posted: Wed Aug 10, 2016 3:50 am
by jun
Hi all,

VASP sometime just gets much slower randomly on our clusters. By looking at the OUTCAR timing info I noticed that for the slow jobs Voluntary context switches are so high. At first I suspected that it might relate to MKL routines spawning too many threads, so I compiled VASP again with sequential MKL and I also tried explicitly export MKL_NUM_THREAD=1 to see if it could be better. However, the oversubscription still persist. I don't know whether this is because of my compilation or the setting of our clusters.

Here is the makefile.include I used:
# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi_phoenix\" -DIFC \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=8000 -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn

CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

# Changed to libmkl_sequential.a
FC = mpifort -I${MKLROOT}/include
FCL = mpifort -mkl=sequential -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_core.a \
${MKLROOT}/lib/intel64/libmkl_sequential.a -Wl,--end-group

FREE = -free -names lowercase

FFLAGS = -assume byterecl -heap-arrays 64

MKL_PATH = $(MKLROOT)/lib/intel64

OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o \
INCS =-I$(MKLROOT)/include/fftw

LLIBS = $(SCALAPACK) $(LAPACK) $(BLAS) -lpthread -lm -ldl

OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
FC_LIB = $(FC)
CC_LIB = icc

OBJECTS_LIB= linpack_double.o getshmem.o

# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
Our clusters run SLURM I think computational resources are assigned automatically and shouldn't be any problem. Am I right? Does anyone have experience avoiding oversubscribe?

Thanks in advance.


Re: Performance loss due to context switching

Posted: Tue Sep 10, 2024 2:44 pm
by support_vasp


We're sorry that we didn’t answer your question. This does not live up to the quality of support that we aim to provide. The team has since expanded. If we can still help with your problem, please ask again in a new post, linking to this one, and we will answer as quickly as possible.

Best wishes,