[molpro-user] test job hangs on FC2 Xeon Node
H. -J. Werner
werner at theochem.uni-stuttgart.de
Tue Dec 7 09:15:39 GMT 2004
The dsyev problem is known, and apparently due to a bug in mkl.
On some systems I have trouble with mkl701, but mkl61 works fine.
You can probably avoid the problem by setting ftcflag "olddiag2".
Joachim Werner
On Di, 07 Dez 2004, Dr Seth OLSEN wrote:
>
>
>Hi Nick,
>
>The MKL libraries did not work either. In the end, the way I got around the problem was to install RedHat on the nodes in question. I am not sure that the problem here is with MolPro. I have similar problems installing other program suites on these nodes and the general symptomology is identical - processes become unkillable, taking up all the %CPU (all of which, it appears, is SYSTEM, not USER) and almost %mem. It appears that the problem was with Fedora on this particular architecture.
>
>Cheers,
>
>Seth
>ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
>Dr Seth Olsen, PhD
>Postdoctoral Fellow, Computational Systems Biology Group
>Centre for Computational Molecular Science
>Chemistry Building,
>The University of Queensland
>Qld 4072, Brisbane, Australia
>
>tel (617) 33653732
>fax (617) 33654623
>email: s.olsen1 at uq.edu.au
>Web: www.ccms.uq.edu.au
>
>ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
>
>
>
>----- Original Message -----
>From: Nick Wilson <WilsonNT at Cardiff.ac.uk>
>Date: Wednesday, November 24, 2004 2:52 am
>Subject: Re: [molpro-user] test job hangs on FC2 Xeon Node
>
>> Dear Seth,
>>
>> Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I
>> think I
>> might have reproduced a similar race condition.
>>
>> It went away when I used the mkl_ia32 blas library by editing CONFIG:
>>
>> BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32
>> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
>> LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack
>> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
>>
>> (you might also need -lguide if you don't have -openmp) and doing
>> "make"
>> or when I used the blas shipped with molpro by editing the CONFIG
>> file thus:
>>
>> FTCFLAGS="mpp eaf blas0"
>> BLASLIB=""
>> LAPACKLIB=""
>> BLASLIB_p4=""
>> LAPACKLIB_p4=""
>>
>> Then doing:
>>
>> rm lib/libmolpro.a
>> make
>>
>>
>> Can you test whether either intel's or molpro's blas/lapack fixes
>> your
>> problem.
>> Best wishes,
>> Nick
>>
>>
>> Dr Seth OLSEN wrote:
>> > Hello Molpro-Users,
>> >
>> > As outlined in previous communiques, I have been having no luck
>> in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core
>> 2, either as the installed rpm or as a self-compiled version done
>> with ifc7 or ifc8. The problem is as follows. After the integral
>> sort, the process writes no more to output but becomes unkillable
>> with 99.9%CPU and 1.0%Mem as given by 'top'.
>> >
>> > In order to help diagnose the problem, I have turned the
>> 'gprint,io,cpu' directive on in a given failing job
>> (bccd_opt.test). The following are the last lines written to
>> output for that job with the io printing turned on:
>> >
>> > EXTENDING RECORD 1300.1 BY 34949. WORDS TO 38820.
>> IMPLEMENTATION=df EXTENSION 0
>> >
>> > NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 34949. BUFFER
>> LENGTH: 32768
>> > NUMBER OF SEGMENTS: 1 SEGMENT LENGTH: 34949 RECORD
>> LENGTH: 524288
>> >
>> > Memory used in sort: 0.59 MW
>> > OPENW FILE 24 NAME=/scratch/root/eaf_T2400002627.TMP
>> IMPLEMENTATION=eaf STATUS=scratch HANDLE= 2
>> > OPEN EAF FILE 24 NAME= IMPLEMENTATION=eaf
>> > CLOSEW FILE 21 NAME=eaf_T2100002627.TMP IMPLEMENTATION=eaf
>> HANDLE= 1
>> > CLOSE EAF FILE 21
>> >
>> > To determine what files might be opened by molpro at the time
>> that the program stops functioning, I issue a 'lsof | grep molpro'
>> command while the program is running in it's 'unkillable' final
>> status. The following is the output of that command:
>> >
>> > bash 2210 root cwd DIR 8,1 12288
>> 295282 /opt/molpro/testjobs
>> > molpro 2624 root cwd DIR 8,2 4096
>> 2796193 /scratch/root
>> > molpro 2624 root rtd DIR 8,1 4096
>> 2 /
>> > molpro 2624 root txt REG 8,1 41552
>> 491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
>> > molpro 2624 root mem REG 8,1 1455084
>> 82119 /lib/tls/libc-2.3.3.so
>> > molpro 2624 root mem REG 8,1 106892
>> 375519 /lib/ld-2.3.3.so
>> > molpro 2624 root 0u CHR 136,1
>> 3 /dev/pts/1
>> > molpro 2624 root 1u CHR 136,1
>> 3 /dev/pts/1
>> > molpro 2624 root 2u CHR 136,1
>> 3 /dev/pts/1
>> > molpro 2624 root 4u REG 8,1 823
>> 18013 /tmp/tmpfuX2LXr (deleted)
>> > parallel 2626 root txt REG 8,1 30180
>> 491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
>> > parallel 2626 root 1u REG 8,1 12113
>> 295235 /opt/molpro/testjobs/bccd_opt.out
>> > molprop_2 2627 root cwd DIR 8,2 4096
>> 2796193 /scratch/root
>> > molprop_2 2627 root rtd DIR 8,1 4096
>> 2 /
>> > molprop_2 2627 root txt REG 8,1 19346064
>> 491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-
>> 2002.6/molprop_2002_6_p4_tcgmsg.exe> molprop_2 2627 root mem
>> REG 8,1 96248 375542 /lib/libnsl-2.3.3.so
>> > molprop_2 2627 root mem REG 8,1 106892
>> 375519 /lib/ld-2.3.3.so
>> > molprop_2 2627 root mem REG 8,1 1455084
>> 82119 /lib/tls/libc-2.3.3.so
>> > molprop_2 2627 root mem REG 8,1 214796
>> 82121 /lib/tls/libm-2.3.3.so
>> > molprop_2 2627 root mem REG 8,1 43528
>> 375552 /lib/libnss_nis-2.3.3.so
>> > molprop_2 2627 root mem REG 8,1 50944
>> 375549 /lib/libnss_files-2.3.3.so
>> > molprop_2 2627 root 0u REG 8,1 823
>> 18013 /tmp/tmpfuX2LXr (deleted)
>> > molprop_2 2627 root 1u REG 8,1 12113
>> 295235 /opt/molpro/testjobs/bccd_opt.out
>> > molprop_2 2627 root 2u CHR 136,1
>> 3 /dev/pts/1
>> > molprop_2 2627 root 3u IPv4 4464
>> TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
>> > molprop_2 2627 root 4u REG 8,1 1457
>> 18014 /tmp/forttempG1uyhO
>> > molprop_2 2627 root 5u REG 8,1 74
>> 18015 /tmp/forttempfYdJb0
>> > molprop_2 2627 root 6u REG 8,1 0
>> 18016 /tmp/forttemp2hFU5b
>> > molprop_2 2627 root 7u REG 8,1 0
>> 18017 /tmp/forttemp9p96Zn
>> > molprop_2 2627 root 8u REG 8,2 3006888
>> 2796194 /scratch/root/df_T0100002627.TMP (deleted)
>> > molprop_2 2627 root 9u REG 8,2 182344
>> 2796195 /scratch/root/df_T0200002627.TMP (deleted)
>> > molprop_2 2627 root 10u REG 8,2 182344
>> 2796196 /scratch/root/df_T0300002627.TMP (deleted)
>> > molprop_2 2627 root 11u REG 8,2 0
>> 2796197 /scratch/root/df_T0400002627.TMP (deleted)
>> > molprop_2 2627 root 12r REG 8,1 476967
>> 491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
>> > molprop_2 2627 root 13u REG 8,2 3428352
>> 2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
>> >
>> > So, it appears that the *.TMP files that molpro has most recently
>> opened and closed are listed as deleted but still open. I cannot
>> find these files in the specified directory, which makes sense if
>> they are deleted, but if they are deleted than how can they be
>> currently open files?
>> >
>> > Cheers,
>> >
>> > Seth Olsen
>> >
>> >
>> >
>> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>> >
>> > Dr Seth Olsen, PhD
>> > Postdoctoral Fellow, Computational Systems Biology Group
>> > Centre for Computational Molecular Science
>> > Chemistry Building,
>> > The University of Queensland
>> > Qld 4072, Brisbane, Australia
>> >
>> > tel (617) 33653732
>> > fax (617) 33654623
>> > email: s.olsen1 at uq.edu.au
>> > Web: www.ccms.uq.edu.au
>> >
>> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>> >
>> >
>> >
>> >
>>
--
Prof. Hans-Joachim Werner
Institute for Theoretical Chemistry
University of Stuttgart
Pfaffenwaldring 55
D-70569 Stuttgart, Germany
Tel.: (0049) 711 / 685 4400
Fax.: (0049) 711 / 685 4442
e-mail: werner at theochem.uni-stuttgart.de
More information about the Molpro-user
mailing list