[molpro-user] test job hangs on FC2 Xeon Node
Nick Wilson
WilsonNT at Cardiff.ac.uk
Tue Nov 23 16:52:27 GMT 2004
Dear Seth,
Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I think I
might have reproduced a similar race condition.
It went away when I used the mkl_ia32 blas library by editing CONFIG:
BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32
-Wl,-rpath,/opt/intel/mkl61/lib/32 "
LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack
-Wl,-rpath,/opt/intel/mkl61/lib/32 "
(you might also need -lguide if you don't have -openmp) and doing "make"
or when I used the blas shipped with molpro by editing the CONFIG file thus:
FTCFLAGS="mpp eaf blas0"
BLASLIB=""
LAPACKLIB=""
BLASLIB_p4=""
LAPACKLIB_p4=""
Then doing:
rm lib/libmolpro.a
make
Can you test whether either intel's or molpro's blas/lapack fixes your
problem.
Best wishes,
Nick
Dr Seth OLSEN wrote:
> Hello Molpro-Users,
>
> As outlined in previous communiques, I have been having no luck in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core 2, either as the installed rpm or as a self-compiled version done with ifc7 or ifc8. The problem is as follows. After the integral sort, the process writes no more to output but becomes unkillable with 99.9%CPU and 1.0%Mem as given by 'top'.
>
> In order to help diagnose the problem, I have turned the 'gprint,io,cpu' directive on in a given failing job (bccd_opt.test). The following are the last lines written to output for that job with the io printing turned on:
>
> EXTENDING RECORD 1300.1 BY 34949. WORDS TO 38820. IMPLEMENTATION=df EXTENSION 0
>
> NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 34949. BUFFER LENGTH: 32768
> NUMBER OF SEGMENTS: 1 SEGMENT LENGTH: 34949 RECORD LENGTH: 524288
>
> Memory used in sort: 0.59 MW
> OPENW FILE 24 NAME=/scratch/root/eaf_T2400002627.TMP IMPLEMENTATION=eaf STATUS=scratch HANDLE= 2
> OPEN EAF FILE 24 NAME= IMPLEMENTATION=eaf
> CLOSEW FILE 21 NAME=eaf_T2100002627.TMP IMPLEMENTATION=eaf HANDLE= 1
> CLOSE EAF FILE 21
>
> To determine what files might be opened by molpro at the time that the program stops functioning, I issue a 'lsof | grep molpro' command while the program is running in it's 'unkillable' final status. The following is the output of that command:
>
> bash 2210 root cwd DIR 8,1 12288 295282 /opt/molpro/testjobs
> molpro 2624 root cwd DIR 8,2 4096 2796193 /scratch/root
> molpro 2624 root rtd DIR 8,1 4096 2 /
> molpro 2624 root txt REG 8,1 41552 491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
> molpro 2624 root mem REG 8,1 1455084 82119 /lib/tls/libc-2.3.3.so
> molpro 2624 root mem REG 8,1 106892 375519 /lib/ld-2.3.3.so
> molpro 2624 root 0u CHR 136,1 3 /dev/pts/1
> molpro 2624 root 1u CHR 136,1 3 /dev/pts/1
> molpro 2624 root 2u CHR 136,1 3 /dev/pts/1
> molpro 2624 root 4u REG 8,1 823 18013 /tmp/tmpfuX2LXr (deleted)
> parallel 2626 root txt REG 8,1 30180 491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
> parallel 2626 root 1u REG 8,1 12113 295235 /opt/molpro/testjobs/bccd_opt.out
> molprop_2 2627 root cwd DIR 8,2 4096 2796193 /scratch/root
> molprop_2 2627 root rtd DIR 8,1 4096 2 /
> molprop_2 2627 root txt REG 8,1 19346064 491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molprop_2002_6_p4_tcgmsg.exe
> molprop_2 2627 root mem REG 8,1 96248 375542 /lib/libnsl-2.3.3.so
> molprop_2 2627 root mem REG 8,1 106892 375519 /lib/ld-2.3.3.so
> molprop_2 2627 root mem REG 8,1 1455084 82119 /lib/tls/libc-2.3.3.so
> molprop_2 2627 root mem REG 8,1 214796 82121 /lib/tls/libm-2.3.3.so
> molprop_2 2627 root mem REG 8,1 43528 375552 /lib/libnss_nis-2.3.3.so
> molprop_2 2627 root mem REG 8,1 50944 375549 /lib/libnss_files-2.3.3.so
> molprop_2 2627 root 0u REG 8,1 823 18013 /tmp/tmpfuX2LXr (deleted)
> molprop_2 2627 root 1u REG 8,1 12113 295235 /opt/molpro/testjobs/bccd_opt.out
> molprop_2 2627 root 2u CHR 136,1 3 /dev/pts/1
> molprop_2 2627 root 3u IPv4 4464 TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
> molprop_2 2627 root 4u REG 8,1 1457 18014 /tmp/forttempG1uyhO
> molprop_2 2627 root 5u REG 8,1 74 18015 /tmp/forttempfYdJb0
> molprop_2 2627 root 6u REG 8,1 0 18016 /tmp/forttemp2hFU5b
> molprop_2 2627 root 7u REG 8,1 0 18017 /tmp/forttemp9p96Zn
> molprop_2 2627 root 8u REG 8,2 3006888 2796194 /scratch/root/df_T0100002627.TMP (deleted)
> molprop_2 2627 root 9u REG 8,2 182344 2796195 /scratch/root/df_T0200002627.TMP (deleted)
> molprop_2 2627 root 10u REG 8,2 182344 2796196 /scratch/root/df_T0300002627.TMP (deleted)
> molprop_2 2627 root 11u REG 8,2 0 2796197 /scratch/root/df_T0400002627.TMP (deleted)
> molprop_2 2627 root 12r REG 8,1 476967 491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
> molprop_2 2627 root 13u REG 8,2 3428352 2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
>
> So, it appears that the *.TMP files that molpro has most recently opened and closed are listed as deleted but still open. I cannot find these files in the specified directory, which makes sense if they are deleted, but if they are deleted than how can they be currently open files?
>
> Cheers,
>
> Seth Olsen
>
>
>
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
> Dr Seth Olsen, PhD
> Postdoctoral Fellow, Computational Systems Biology Group
> Centre for Computational Molecular Science
> Chemistry Building,
> The University of Queensland
> Qld 4072, Brisbane, Australia
>
> tel (617) 33653732
> fax (617) 33654623
> email: s.olsen1 at uq.edu.au
> Web: www.ccms.uq.edu.au
>
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
>
>
>
More information about the Molpro-user
mailing list