[molpro-user] location of scratch files in parallel runs
Dr. Anatoliy Volkov
avolkov at mtsu.edu
Wed Jun 17 19:00:46 BST 2009
Greetings,
I have a question about location of scratch files when running a parallel
(64 bit mpp) version of Molpro 2008.1 on a cluster of smp nodes.
In my PBS/TORQUE script I create a temporary directory on each
of the nodes for Molpro to store scratch files:
set SCR = "/scratch/$PBS_JOBID"
foreach node ($NODES)
rsh $node mkdir -p $SCR
end
where $NODES is a list of nodes on which the job will be executed
and $PBS_JOBID is a unique ID assigned to a job by PBS/TORQUE.
For example, in a job I am running right now PBS_JOBID =
7840.voltron.mtsu.edu,
so scratch directory /scratch/7840.voltron.mtsu.edu is created on each
of the nodes.
I then invoke molpro script and let it deal with $PBS_NODEFILE:
/usr/local/molpro/molpro -o $ofile -d $SCR $ifile
where $ifile - Molpro input file name and $ofile - Molpro output file name
I use -d option to tell Molpro to use newly created temp directories for
scratch files,
but I have decided not to use -N and -n options as molpro script seems
to be able to extract
all the necessary information from $PBS_NODEFILE by itself.
In Molpro output file I get:
-----------------------------------------------------------------
Primary working directories : /scratch/7840.voltron.mtsu.edu
Secondary working directories : /scratch/7840.voltron.mtsu.edu
Wavefunction directory : /home/avolkov/wfu/
Main file repository : /scratch/7840.voltron.mtsu.edu/
cpu : Intel(R) Xeon(R) CPU E5320 @ 1.86GHz 1862.023 MHz
FC : /opt/intel/fce/10.1.008/bin/ifort
FCVERSION : 10.1
BLASLIB : -L/opt/intel/mkl/9.1/lib/em64t -lmkl_em64t -lguide
-lpthread -openmp
id : mtsu
MPP nodes nproc
tron09 8
tron08 8
tron07 8
tron06 8
ga_uses_ma=false, calling ma_init with nominal heap.
GA-space will be limited to 64.0 MW (determined by -G option)
MPP tuning parameters: Latency= 0 Microseconds, Broadcast
speed= 0 MB/sec
default implementation of scratch files=ga
-----------------------------------------------------------------
It does seem like the program reads names of scratch directories correctly.
In fact, I can see that Molrpo created some fortran temporary
files in /scratch/7840.voltron.mtsu.edu/ on each of the nodes but these
files
to do not grow in size as calculation proceeds:
avolkov at tron09:/scratch/7840.voltron.mtsu.edu> ls -ltrh
total 164K
-rw-r--r-- 1 avolkov chem 2.9K 2009-06-17 08:30 procgrp.30568
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortZgGExZ
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortvM4eyZ
-rw------- 1 avolkov chem 2.9K 2009-06-17 08:31 fortOUJ0Dk
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortoo0HxZ
-rw------- 1 avolkov chem 2.6K 2009-06-17 08:31 fortNfhGvX
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortkIsLxZ
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortk5eiyZ
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortHmYPxZ
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortehGExZ
-rw------- 1 avolkov chem 0 2009-06-17 08:31 fort8IuJm5
-rw------- 1 avolkov chem 73 2009-06-17 08:31 fortWTWm0H
-rw------- 1 avolkov chem 116K 2009-06-17 12:03 forth156Is
avolkov at tron08:/scratch/7840.voltron.mtsu.edu> ls -ltrh
total 32K
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortVsDUGy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 forttBroHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortrC8eHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortMBroHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortLYrmHy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortHXwJGy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortDaB5Gy
-rw------- 1 avolkov chem 30 2009-06-17 08:31 fortB6XqHy
etc
These files seem to be too small for integral files that Molpro prints out:
-----------------------------------------------------------------
Contracted 2-electron integrals neglected if value below 1.0D-11
AO integral compression algorithm 1 Integral accuracy 1.0D-11
11909.464 MB (compressed) written to integral file ( 18.7%)
Node minimum: 336.331 MB, node maximum: 403.177 MB
NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 827943072. BUFFER
LENGTH: 32768
NUMBER OF SEGMENTS: 58 SEGMENT LENGTH: 14581568 RECORD LENGTH:
131072
Memory used in sort: 14.75 MW
SORT1 READ 7975193852. AND WROTE 104517941. INTEGRALS IN 1223
RECORDS. CPU TIME: 187.08 SEC, REAL TIME: 630.37 SEC
SORT2 READ 3323600146. AND WROTE 26491761471. INTEGRALS IN 49216
RECORDS. CPU TIME: 19.95 SEC, REAL TIME: 117.89 SEC
Node minimum: 827756054. Node maximum: 827979042. integrals
OPERATOR DM FOR CENTER 0 COORDINATES: 0.000000 0.000000
0.000000
**********************************************************************************************************************************
DATASETS * FILE NREC LENGTH (MB) RECORD NAMES
1 18 20.03 500 610 700
900 950 970 1000 129 960 1100
VAR BASINP GEOM
SYMINP ZMAT AOBASIS BASIS P2S ABASIS S
1400 1410 1200
1210 1080 1600 1650 1700
T V H0
H01 AOSYM SMH MOLCAS OPER
PROGRAMS * TOTAL INT
CPU TIMES * 253.96 253.78
REAL TIME * 809.18 SEC
DISK USED * 53.14 GB
GA USED * 0.11 MB (max) 0.00 MB (current)
**********************************************************************************************************************************
As I understand from the output, the disk usage should be on the order
of several gigabytes,
while /scratch/7840.voltron.mtsu.edu directory on each of the nodes
contains less than a megabyte of data.
Does it really mean that all these scratch files are allocated in GA, as
given at the beginning of the
Molpro output file ? , i.e.
-----------------------------------------------------------------
default implementation of scratch files=ga
-----------------------------------------------------------------
It does seem like so, because each process (when using 'top' command)
shows using about 900MB
of virtual memory, even though only 160MB are resident:
for example, on master node tron09:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30899 avolkov 25 0 888m 159m 7236 R 101 1.0 215:09.04 molprop_2008_1_
30900 avolkov 25 0 888m 159m 7240 R 100 1.0 215:50.31 molprop_2008_1_
30901 avolkov 25 0 888m 159m 7172 R 100 1.0 215:57.17 molprop_2008_1_
30903 avolkov 25 0 888m 159m 7096 R 100 1.0 215:10.51 molprop_2008_1_
30914 avolkov 25 0 888m 159m 7068 R 100 1.0 214:08.87 molprop_2008_1_
30904 avolkov 25 0 888m 159m 7092 R 99 1.0 215:11.66 molprop_2008_1_
30912 avolkov 25 0 888m 159m 7088 R 80 1.0 215:13.30 molprop_2008_1_
30898 avolkov 15 0 892m 160m 7504 R 37 1.0 118:15.30 molprop_2008_1_
and similar on all slave nodes....
However, this seems to contradict what is given in Molpro output:
DISK USED * 53.14 GB
GA USED * 0.11 MB (max) 0.00 MB (current)
which suggests that it is disk space that is used, not GA.
Am I doing something wrong when submitting the job, or do I misinterpret
Molpro output ?
I do want all temp files to be written to local (for each of the nodes)
/scratch diskspace,
or keep them in GA.
Any help would be very much appreciated.
Thank you,
Anatoliy
--
Anatoliy Volkov, Ph.D.
Associate Professor
Department of Chemistry
Middle Tennessee State University
239 Davis Science Bldg.
MTSU Box 68
Murfreesboro, TN 37132
E-mail: avolkov at mtsu.edu
Fax: (615) 898-5182
Phone: (615) 494-8655
More information about the Molpro-user
mailing list