[molpro-user] I/O error on large ccsd(t) job
Gert von Helden
helden at fhi-berlin.mpg.de
Thu Dec 21 16:12:22 GMT 2006
Dear all,
I am trying to perform a large ccsd(t) calculation using molpro
2006.1 on a opteron system. I compiled with pgf 6.1 (also tried
ifort) and produced a serial and an mpi parallel version, linked with
ga 4.0.1 and mpich. All program versions passed the quicktests.
In all cases, the job fails with some I/O related error. Lack of disk
space is very likely not the error (I tried also using a SAN with
very large capacity as scratch).
I would appreciate any help!
...Gert
for a parallel job, running on 4 CPUs, I get (using 2 or only 1 CPUs
gives variations of the same error):
>
> 108174.246 MB (compressed) written to integral file ( 43.0%)
>
> Node minimum: 26165.903 MB, node maximum: 27669.299 MB
>
>
> NUMBER OF SORTED TWO-ELECTRON INTEGRALS: 7385262466. BUFFER
> LENGTH: 32768
> NUMBER OF SEGMENTS: 155 SEGMENT LENGTH: 47998888 RECORD
> LENGTH: 262144
>
> Memory used in sort: 48.29 MW
>
> SORT1 READ31451116099. AND WROTE 5877056503. INTEGRALS IN33710
> RECORDS. CPU TIME: 4100.09 SEC, REAL TIME: 5524.84 SEC
> 3:3:fehler on processor 3:: 4921228
The serial version gives:
(There were certainly a few hundred GB scratch still available)
> Contracted 2-electron integrals neglected if value below 1.0D-13
> AO integral compression algorithm 1 Integral accuracy 1.0D-13
>
> 108173.984 MB (compressed) written to integral file ( 43.0%)
>
>
> NUMBER OF SORTED TWO-ELECTRON INTEGRALS:29541429126. BUFFER
> LENGTH: 32768
> NUMBER OF SEGMENTS: 185 SEGMENT LENGTH: 159999781 RECORD
> LENGTH: 524288
>
> Memory used in sort: 160.56 MW
>
> SORT1 READ31451116099. AND WROTE23507657552. INTEGRALS IN67352
> RECORDS. CPU TIME: 1633.81 SEC, REAL TIME: 6599.01 SEC
> Read error in iow_direct_read; fd=13, l=32768, p=30540206080; read
> returns -1
>
> ERROR READING 32768 WORDS AT OFFSET30540206080. FROM FILE 4
> IMPLEMENTATION=df FILE HANDLE= 13 IERR=******
>
> Records on file 4
>
> IREC NAME TYPE OFFSET LENGTH IMPLEMENTATION EXT
> PREV PARENT MPP_STATE
> 1 1350 BUCK 4096.********** df
> 0 0 0 0
>
> ? Error
> ? I/O error
> ? The problem occurs in readw
>
> ERROR EXIT
> CURRENT STACK: MAIN
>
>
> **********************************************************************
> ************************************************************
> DATASETS * FILE NREC LENGTH (MB) RECORD NAMES
> 1 18 104522.63 500 610
> 700 900 950 970 1000 1100 1400 1410
> VAR BASINP GEOM
> SYMINP ZMAT AOBASIS BASIS S T V
> 1200 1210 1080
> 1600 129 960 1650 1300
> H0 H01
> AOSYM SMH P2S ABASIS MOLCAS ERIS
>
> 2 4 1.33 500 610 700
> 1000
> VAR BASINP GEOM
> BASIS
More information about the Molpro-user
mailing list