[molpro-user] test job hangs on FC2 Xeon Node

Nick Wilson WilsonNT at Cardiff.ac.uk
Tue Nov 23 16:52:27 GMT 2004


Dear Seth,

Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I think I 
might have reproduced a similar race condition.

It went away when I used the mkl_ia32 blas library by editing CONFIG:

BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32 
-Wl,-rpath,/opt/intel/mkl61/lib/32 "
LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack 
-Wl,-rpath,/opt/intel/mkl61/lib/32 "

(you might also need -lguide if you don't have -openmp) and doing "make"

or when I used the blas shipped with molpro by editing the CONFIG file thus:

FTCFLAGS="mpp eaf blas0"
BLASLIB=""
LAPACKLIB=""
BLASLIB_p4=""
LAPACKLIB_p4=""

Then doing:

rm lib/libmolpro.a
make


Can you test whether either intel's or molpro's blas/lapack fixes your 
problem.
Best wishes,
Nick


Dr Seth OLSEN wrote:
> Hello Molpro-Users,
> 
> As outlined in previous communiques, I have been having no luck in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core 2, either as the installed rpm or as a self-compiled version done with ifc7 or ifc8.  The problem is as follows.  After the integral sort, the process writes no more to output but becomes unkillable with 99.9%CPU and 1.0%Mem as given by 'top'.
> 
> In order to help diagnose the problem, I have turned the 'gprint,io,cpu' directive on in a given failing job (bccd_opt.test).  The following are the last lines written to output for that job with the io printing turned on:
> 
>  EXTENDING RECORD    1300.1 BY        34949. WORDS TO      38820. IMPLEMENTATION=df    EXTENSION 0
>  
>  NUMBER OF SORTED TWO-ELECTRON INTEGRALS:      34949.     BUFFER LENGTH:  32768
>  NUMBER OF SEGMENTS:   1  SEGMENT LENGTH:      34949      RECORD LENGTH: 524288
>  
>  Memory used in sort:       0.59 MW
>  OPENW FILE 24  NAME=/scratch/root/eaf_T2400002627.TMP  IMPLEMENTATION=eaf   STATUS=scratch   HANDLE=     2
>  OPEN EAF FILE 24  NAME=  IMPLEMENTATION=eaf
>  CLOSEW FILE 21  NAME=eaf_T2100002627.TMP  IMPLEMENTATION=eaf   HANDLE=     1
>  CLOSE EAF FILE 21
> 
> To determine what files might be opened by molpro at the time that the program stops functioning, I issue a 'lsof | grep molpro' command while the program is running in it's 'unkillable' final status.  The following is the output of that command:
> 
> bash      2210     root  cwd    DIR        8,1    12288     295282 /opt/molpro/testjobs
> molpro    2624     root  cwd    DIR        8,2     4096    2796193 /scratch/root
> molpro    2624     root  rtd    DIR        8,1     4096          2 /
> molpro    2624     root  txt    REG        8,1    41552     491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
> molpro    2624     root  mem    REG        8,1  1455084      82119 /lib/tls/libc-2.3.3.so
> molpro    2624     root  mem    REG        8,1   106892     375519 /lib/ld-2.3.3.so
> molpro    2624     root    0u   CHR      136,1                   3 /dev/pts/1
> molpro    2624     root    1u   CHR      136,1                   3 /dev/pts/1
> molpro    2624     root    2u   CHR      136,1                   3 /dev/pts/1
> molpro    2624     root    4u   REG        8,1      823      18013 /tmp/tmpfuX2LXr (deleted)
> parallel  2626     root  txt    REG        8,1    30180     491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
> parallel  2626     root    1u   REG        8,1    12113     295235 /opt/molpro/testjobs/bccd_opt.out
> molprop_2 2627     root  cwd    DIR        8,2     4096    2796193 /scratch/root
> molprop_2 2627     root  rtd    DIR        8,1     4096          2 /
> molprop_2 2627     root  txt    REG        8,1 19346064     491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molprop_2002_6_p4_tcgmsg.exe
> molprop_2 2627     root  mem    REG        8,1    96248     375542 /lib/libnsl-2.3.3.so
> molprop_2 2627     root  mem    REG        8,1   106892     375519 /lib/ld-2.3.3.so
> molprop_2 2627     root  mem    REG        8,1  1455084      82119 /lib/tls/libc-2.3.3.so
> molprop_2 2627     root  mem    REG        8,1   214796      82121 /lib/tls/libm-2.3.3.so
> molprop_2 2627     root  mem    REG        8,1    43528     375552 /lib/libnss_nis-2.3.3.so
> molprop_2 2627     root  mem    REG        8,1    50944     375549 /lib/libnss_files-2.3.3.so
> molprop_2 2627     root    0u   REG        8,1      823      18013 /tmp/tmpfuX2LXr (deleted)
> molprop_2 2627     root    1u   REG        8,1    12113     295235 /opt/molpro/testjobs/bccd_opt.out
> molprop_2 2627     root    2u   CHR      136,1                   3 /dev/pts/1
> molprop_2 2627     root    3u  IPv4       4464                 TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
> molprop_2 2627     root    4u   REG        8,1     1457      18014 /tmp/forttempG1uyhO
> molprop_2 2627     root    5u   REG        8,1       74      18015 /tmp/forttempfYdJb0
> molprop_2 2627     root    6u   REG        8,1        0      18016 /tmp/forttemp2hFU5b
> molprop_2 2627     root    7u   REG        8,1        0      18017 /tmp/forttemp9p96Zn
> molprop_2 2627     root    8u   REG        8,2  3006888    2796194 /scratch/root/df_T0100002627.TMP (deleted)
> molprop_2 2627     root    9u   REG        8,2   182344    2796195 /scratch/root/df_T0200002627.TMP (deleted)
> molprop_2 2627     root   10u   REG        8,2   182344    2796196 /scratch/root/df_T0300002627.TMP (deleted)
> molprop_2 2627     root   11u   REG        8,2        0    2796197 /scratch/root/df_T0400002627.TMP (deleted)
> molprop_2 2627     root   12r   REG        8,1   476967     491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
> molprop_2 2627     root   13u   REG        8,2  3428352    2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
> 
> So, it appears that the *.TMP files that molpro has most recently opened and closed are listed as deleted but still open.  I cannot find these files in the specified directory, which makes sense if they are deleted, but if they are deleted than how can they be currently open files?
> 
> Cheers,
> 
> Seth Olsen
> 
> 
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Dr Seth Olsen, PhD
> Postdoctoral Fellow, Computational Systems Biology Group
> Centre for Computational Molecular Science
> Chemistry Building,
> The University of Queensland
> Qld 4072, Brisbane, Australia
> 
> tel (617) 33653732
> fax (617) 33654623
> email: s.olsen1 at uq.edu.au
> Web: www.ccms.uq.edu.au 
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> 
> 
> 



More information about the Molpro-user mailing list