[molpro-user] test job hangs on FC2 Xeon Node
Dr Seth OLSEN
s.olsen1 at uq.edu.au
Tue Dec 7 05:42:22 GMT 2004
Hi Nick,
The MKL libraries did not work either. In the end, the way I got around the problem was to install RedHat on the nodes in question. I am not sure that the problem here is with MolPro. I have similar problems installing other program suites on these nodes and the general symptomology is identical - processes become unkillable, taking up all the %CPU (all of which, it appears, is SYSTEM, not USER) and almost %mem. It appears that the problem was with Fedora on this particular architecture.
Cheers,
Seth
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Dr Seth Olsen, PhD
Postdoctoral Fellow, Computational Systems Biology Group
Centre for Computational Molecular Science
Chemistry Building,
The University of Queensland
Qld 4072, Brisbane, Australia
tel (617) 33653732
fax (617) 33654623
email: s.olsen1 at uq.edu.au
Web: www.ccms.uq.edu.au
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
----- Original Message -----
From: Nick Wilson <WilsonNT at Cardiff.ac.uk>
Date: Wednesday, November 24, 2004 2:52 am
Subject: Re: [molpro-user] test job hangs on FC2 Xeon Node
> Dear Seth,
>
> Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I
> think I
> might have reproduced a similar race condition.
>
> It went away when I used the mkl_ia32 blas library by editing CONFIG:
>
> BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32
> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
> LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack
> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
>
> (you might also need -lguide if you don't have -openmp) and doing
> "make"
> or when I used the blas shipped with molpro by editing the CONFIG
> file thus:
>
> FTCFLAGS="mpp eaf blas0"
> BLASLIB=""
> LAPACKLIB=""
> BLASLIB_p4=""
> LAPACKLIB_p4=""
>
> Then doing:
>
> rm lib/libmolpro.a
> make
>
>
> Can you test whether either intel's or molpro's blas/lapack fixes
> your
> problem.
> Best wishes,
> Nick
>
>
> Dr Seth OLSEN wrote:
> > Hello Molpro-Users,
> >
> > As outlined in previous communiques, I have been having no luck
> in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core
> 2, either as the installed rpm or as a self-compiled version done
> with ifc7 or ifc8. The problem is as follows. After the integral
> sort, the process writes no more to output but becomes unkillable
> with 99.9%CPU and 1.0%Mem as given by 'top'.
> >
> > In order to help diagnose the problem, I have turned the
> 'gprint,io,cpu' directive on in a given failing job
> (bccd_opt.test). The following are the last lines written to
> output for that job with the io printing turned on:
> >
> > EXTENDING RECORD 1300.1 BY 34949. WORDS TO 38820.
> IMPLEMENTATION=df EXTENSION 0
> >
> > NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 34949. BUFFER
> LENGTH: 32768
> > NUMBER OF SEGMENTS: 1 SEGMENT LENGTH: 34949 RECORD
> LENGTH: 524288
> >
> > Memory used in sort: 0.59 MW
> > OPENW FILE 24 NAME=/scratch/root/eaf_T2400002627.TMP
> IMPLEMENTATION=eaf STATUS=scratch HANDLE= 2
> > OPEN EAF FILE 24 NAME= IMPLEMENTATION=eaf
> > CLOSEW FILE 21 NAME=eaf_T2100002627.TMP IMPLEMENTATION=eaf
> HANDLE= 1
> > CLOSE EAF FILE 21
> >
> > To determine what files might be opened by molpro at the time
> that the program stops functioning, I issue a 'lsof | grep molpro'
> command while the program is running in it's 'unkillable' final
> status. The following is the output of that command:
> >
> > bash 2210 root cwd DIR 8,1 12288
> 295282 /opt/molpro/testjobs
> > molpro 2624 root cwd DIR 8,2 4096
> 2796193 /scratch/root
> > molpro 2624 root rtd DIR 8,1 4096
> 2 /
> > molpro 2624 root txt REG 8,1 41552
> 491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
> > molpro 2624 root mem REG 8,1 1455084
> 82119 /lib/tls/libc-2.3.3.so
> > molpro 2624 root mem REG 8,1 106892
> 375519 /lib/ld-2.3.3.so
> > molpro 2624 root 0u CHR 136,1
> 3 /dev/pts/1
> > molpro 2624 root 1u CHR 136,1
> 3 /dev/pts/1
> > molpro 2624 root 2u CHR 136,1
> 3 /dev/pts/1
> > molpro 2624 root 4u REG 8,1 823
> 18013 /tmp/tmpfuX2LXr (deleted)
> > parallel 2626 root txt REG 8,1 30180
> 491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
> > parallel 2626 root 1u REG 8,1 12113
> 295235 /opt/molpro/testjobs/bccd_opt.out
> > molprop_2 2627 root cwd DIR 8,2 4096
> 2796193 /scratch/root
> > molprop_2 2627 root rtd DIR 8,1 4096
> 2 /
> > molprop_2 2627 root txt REG 8,1 19346064
> 491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-
> 2002.6/molprop_2002_6_p4_tcgmsg.exe> molprop_2 2627 root mem
> REG 8,1 96248 375542 /lib/libnsl-2.3.3.so
> > molprop_2 2627 root mem REG 8,1 106892
> 375519 /lib/ld-2.3.3.so
> > molprop_2 2627 root mem REG 8,1 1455084
> 82119 /lib/tls/libc-2.3.3.so
> > molprop_2 2627 root mem REG 8,1 214796
> 82121 /lib/tls/libm-2.3.3.so
> > molprop_2 2627 root mem REG 8,1 43528
> 375552 /lib/libnss_nis-2.3.3.so
> > molprop_2 2627 root mem REG 8,1 50944
> 375549 /lib/libnss_files-2.3.3.so
> > molprop_2 2627 root 0u REG 8,1 823
> 18013 /tmp/tmpfuX2LXr (deleted)
> > molprop_2 2627 root 1u REG 8,1 12113
> 295235 /opt/molpro/testjobs/bccd_opt.out
> > molprop_2 2627 root 2u CHR 136,1
> 3 /dev/pts/1
> > molprop_2 2627 root 3u IPv4 4464
> TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
> > molprop_2 2627 root 4u REG 8,1 1457
> 18014 /tmp/forttempG1uyhO
> > molprop_2 2627 root 5u REG 8,1 74
> 18015 /tmp/forttempfYdJb0
> > molprop_2 2627 root 6u REG 8,1 0
> 18016 /tmp/forttemp2hFU5b
> > molprop_2 2627 root 7u REG 8,1 0
> 18017 /tmp/forttemp9p96Zn
> > molprop_2 2627 root 8u REG 8,2 3006888
> 2796194 /scratch/root/df_T0100002627.TMP (deleted)
> > molprop_2 2627 root 9u REG 8,2 182344
> 2796195 /scratch/root/df_T0200002627.TMP (deleted)
> > molprop_2 2627 root 10u REG 8,2 182344
> 2796196 /scratch/root/df_T0300002627.TMP (deleted)
> > molprop_2 2627 root 11u REG 8,2 0
> 2796197 /scratch/root/df_T0400002627.TMP (deleted)
> > molprop_2 2627 root 12r REG 8,1 476967
> 491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
> > molprop_2 2627 root 13u REG 8,2 3428352
> 2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
> >
> > So, it appears that the *.TMP files that molpro has most recently
> opened and closed are listed as deleted but still open. I cannot
> find these files in the specified directory, which makes sense if
> they are deleted, but if they are deleted than how can they be
> currently open files?
> >
> > Cheers,
> >
> > Seth Olsen
> >
> >
> >
> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> >
> > Dr Seth Olsen, PhD
> > Postdoctoral Fellow, Computational Systems Biology Group
> > Centre for Computational Molecular Science
> > Chemistry Building,
> > The University of Queensland
> > Qld 4072, Brisbane, Australia
> >
> > tel (617) 33653732
> > fax (617) 33654623
> > email: s.olsen1 at uq.edu.au
> > Web: www.ccms.uq.edu.au
> >
> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> >
> >
> >
> >
>
More information about the Molpro-user
mailing list