[molpro-user] parallel molpro behavior

Thu Sep 2 15:32:42 BST 2004

Dear Sigismondo,

The scratch files 1 (basis set, geometry, and the one and two electron 
integrals), 2 (orbitals, CI coefficients, and density matrices) and 3 
(auxiliary restart file) ignore the "default implementation of scratch 
files" setting and are always set to direct files (df) unless explicitly 
set otherwise using:
         file,1,implementation=sf;
         file,2,implementation=sf;
         file,3,implementation=sf;
with sf for shared files, ga for global arrays, etc

I'm not sure if using the ga or sf implementation for these scratch 
files will work in all cases as it has never been tested. I have run a 
simple testjob within a regatta node using the sf implementation and 
that appeared to work and another very simple testjob between two 
regatta nodes which also appeared to work. Larger tests are waiting in 
the queue.

For debugging I/O information you can use:
         gprint,io

Best wishes,
Nick Wilson

Sigismondo Boschi wrote:
> I am running for the very first times molpro in parallel on an IBM 
> regatta (p690) system with colony switch.
 >
> The typical target systems of our users are CCSD(T) optimnization.
> 
> 
> Running one of them, I would like to use as much as possibile the memory 
> for integrals, and then, if not possibile, the disk, that is a shared 
> GPFS filesystem: there is no benefit in concurrent accesses to it, and 
> it became the bottleneck for the code.
> 
> With typical options I have found that the task don't use much memory, 
> nor standard neither in GA.
> 
> Running on 16 cpus, At the very beginning of the output I found:
> 
> 
> **********
> ARMCI configured for 2 cluster nodes
> 
>  MPP nodes  nproc
>  sp154        8
>  sp152        8
>  ga_uses_ma=false, calling ma_init with nominal heap. Any -G option will 
> be ignored.
> 
>  Primary working directories:    /scratch_ssa/abc0
>  Secondary working directories:  /scratch_ssa/abc0
> 
>  blaslib=default
> 
>  MPP tuning parameters: Latency=    84 Microseconds,   Broadcast speed= 
>  233 MB/sec
>  default implementation of scratch files=ga
> **********
> 
> Only if I use one task (or maybe, one node) I can find ga_uses_ma=true
> on the other side the statement: "default implementation of scratch 
> files=ga" would let me think that they are "in-memory files"... however 
> what happend at run-time does not correspond to it:
> 
> In fact I observe a lot of I/O, and the used memory is about 200 MB (of 
> 2GB) for each task.
> 
> After the CCSD I get:
>  DISK USED  *         9.10 GB
>  GA USED    *       120.58 MB (max)        .00 MB (current)
> 
> And actually I set in the beginning:
>  memory,200,M
> 
> (that is not the GA memory, but the -G option is ignored... I do not 
> understand why).
> 
> 
> Can anybody of you explain some of these facts, and give some suggestion 
> for parallel runs?
> 
> For istance I tried also direct calculations, but:
> 1. it was very slow
> 2. it terminates with the error:
> 
> ******
> FILE 5 RECORD    1380 OFFSET=          0. NOT FOUND
> 
>  Records on file 5
> 
>  IREC   NAME  TYPE        OFFSET    LENGTH   IMPLEMENTATION   EXT PREV   
> PARENT  MPP_STATE
>    1    4000               4096.    21301.         df          0      0 
>      0      1
>    2    4001              25397.   166404.         df          0      0 
>      0      1
>    3    4002             191801.    10725.         df          0      0 
>      0      0
>    4    4003             202526.   178782.         df          0      0 
>      0      1
>    5   35020             381308.    10496.         df          0      0 
>      0      1
>    6    3600             391804.      273.         df          0      0 
>      0      1
>    7    3601             392077.      273.         df          0      0 
>      0      1
>    8   35000             392350.       10.         df          0      0 
>      0      1
>    9   35001             392360.       10.         df          0      0 
>      0      1
>   10   35010             392370.      320.         df          0      0 
>      0      1
>   11   35011             392690.      320.         df          0      0 
>      0      1
>   12    7005             393010.   314964.         df          0      0 
>      0      1
>   13    8005             707974.   314964.         df          0      0 
>      0      1
>   14    9101            1022938.  9567696.         df          0      0 
>      0      0
>   15    9103           10590634.  9567696.         df          0      0 
>      0      0
> 
>  ? Error
>  ? Record not found
>  ? The problem occurs in readm
> 
>  ERROR EXIT
>  CURRENT STACK:      CIPRO  MAIN
> *******
> 
> Many Thanks for any help
> 
>   Regards
> 
> 
>     Sigismondo Boschi
> 
> 
> 
>