File header error running Molpro on multiple cluster nodes
Karen Haskell
khaskell at atcc.necsys.com
Mon Aug 25 22:24:02 BST 2003
I've built Molpro2002.6 on our PC cluster:
8 nodes, each is 2-CPU Pentium
RedHat 9
I'm using GA3.2.6 (built with ARMCI_NETWORK=SOCKETS and tested OK with all
processors), Intel ifc7.1, and mpich-1.2.5.
It runs fine with -n1, and also with -n2 as long as both processes are on one
node.
When I try to run on multiple nodes, e.g. -n4, the processes start okay, I can
see 2 processes on each of 2 nodes. It does some output, then gets a file
header error. The processes remain and must be killed.
Here is the start and end of the output file (h2o_vdz.out):
----------------------------------------------------------------------------------
1 ARMCI configured for 2 cluster nodes
2
3 MPP nodes nproc
4 r2d2 2
5 obiwan 2
6 ga_uses_ma=false, calling ma_init with nominal heap. Any -G option
will be ignored.
7
8 Primary working directories: /tmp/molpro
9 Secondary working directories: /tmp/molpro
...
etc.
...
168 Variable memory set to 1000000 words, buffer space 230000 words
169
170
171
172 Using spherical harmonics
173
174 Bad seek in iow_direct_write; fd=-1, p=4096
175 Bad seek in iow_direct_write; fd=-1, p=4096
176 -10000(s):armci_rcv_req: failed to receive header : 2
177 0:Child process terminated prematurely, status=: 256
178 Bad seek in iow_direct_write; fd=-1, p=4135
179 -10002(s):armci_rcv_req: failed to receive header : 2
180 Bad seek in iow_direct_write; fd=-1, p=4135
----------------------------------------------------------------------------------
Is this a problem with how the cluster is configured, how mpich
is configured, or how Molpro is configured? Or something else?
Any help would be appreciated.
Karen Haskell
khaskell at atcc.necsys.com
More information about the Molpro-user
mailing list