[molpro-user] MRCI problem when running on multiple nodes
Andy May
MayAJ1 at cardiff.ac.uk
Thu Dec 10 10:46:54 GMT 2009
Hi,
Could you send me the input file used and a copy of the CONFIG file from
the build?
Thanks,
Andy
On 03/12/09 13:13, aristotle Papakondylis wrote:
> Dear all
> I am trying to run a MRCI calculation with molpro 2009.1 on two nodes (
> 4 Itanium processors each) of
> my system but molpro crashes with the error message attached below.
> However if I run the same calculation
> on a single node using for example 4 processors the job finishes without
> any problems. Molpro was built with
> ga-4-2 and tcgmsg and I use Infiniband. Any suggestions would be appreciated
> Thanks
>
> A. Papakondylis
> Laboratory of Physical Chemistry
> Department of Chemistry
> University of Athens
>
>
>
> The output:
>
> ......................................................................................................................................
> Number of blocks in overlap matrix: 20 Smallest eigenvalue: 0.30D-06
> Number of N-2 electron functions: 210
> Number of N-1 electron functions: 139698
>
> Number of internal configurations: 20627
> Number of singly external configurations: 6446250
> Number of doubly external configurations: 894852
> Total number of contracted configurations: 7361729
> Total number of uncontracted configurations: 636698630
>
> Diagonal Coupling coefficients finished. Storage: 9109747
> words, CPU-Time: 5.02 seconds.
> Energy denominators for pairs finished in 1 passes. Storage: 893537
> words, CPU-time: 0.09 seconds.
>
> ITER. STATE ROOT SQ.NORM CORR.ENERGY TOTAL ENERGY ENERGY
> CHANGE DEN1 VAR(S) VAR(P) TIME
> 1 1 1 1.00000000 0.00000000 -1384.06452996
> 0.00000000 -0.14494506 0.17D-01 0.37D-01 58.08
> 1 2 2 1.00000000 0.00000000 -1384.03046431
> 0.00000000 -0.18878279 0.16D-01 0.56D-01 58.08
>
> GLOBAL ERROR fehler on processor 5
>
> GLOBAL ERROR fehler on processor 4
>
> GLOBAL ERROR fehler on processor 7
> Last System Error Message from Task 7:: Invalid argument
> Last System Error Message from Task 5:: Invalid argument
> 7:7:fehler:: 1010707757
> (rank:7 hostname:nodeib_08 pid:32674):ARMCI DASSERT fail.
> armci.c:ARMCI_Error():260 cond:0
> Last System Error Message from Task 4:: Invalid argument
> 5:5:fehler:: 1010707757
> (rank:5 hostname:nodeib_08 pid:32622):ARMCI DASSERT fail.
> armci.c:ARMCI_Error():260 cond:0
> 5: ARMCI aborting 0 (0).
> system error message: Invalid argument
> 5: ARMCI aborting 0 (0).
> 7: ARMCI aborting 0 (0).
> 7: ARMCI aborting 0 (0).
> system error message: Invalid argument
>
> GLOBAL ERROR fehler on processor
> 6
> Last System Error Message from Task 6:: Invalid argument
> 6:6:fehler:: 1010707757
> (rank:6 hostname:nodeib_08 pid:32648):ARMCI DASSERT fail.
> armci.c:ARMCI_Error():260 cond:0
> 6: ARMCI aborting 0 (0).
> 6: ARMCI aborting 0 (0).
> system error message: Invalid argument
> 8: interrupt(1)
> 2:SigIntHandler: interrupt signal was caught: 2
> 1:SigIntHandler: interrupt signal was caught: 2
> 3:SigIntHandler: interrupt signal was caught: 2
> Last System Error Message from Task 2:: Numerical result out of range
> Last System Error Message from Task 1:: Numerical result out of range
> Last System Error Message from Task 3:: Numerical result out of range
> 2:SigIntHandler: abort signal was caught: cleaning up: 2
> 2: ARMCI aborting 0 (0).
> agonal Coupling coefficients finished. Storage: 9109747
> words, CPU-Time: 5.02 seconds.
> Energy denominators for pairs Diagonal Coupling coefficients
> finished. Storage: 9109747 words, CPU-Time: 5.02 seconds.
> Energy denominators for pairs
> system error message: Illegal seek
> 1:SigIntHandler: abort signal was caught: cleaning up: 2
> 1: ARMCI aborting 0 (0).
> 1: ARMCI aborting 0 (0).
> system error message: Illegal seek
> 0:SigIntHandler: interrupt signal was caught: 2
> (rank:0 hostname:nodeib_07 pid:4468):ARMCI DASSERT fail.
> signaltrap.c:SigIntHandler():69 cond:0
> 3:SigIntHandler: abort signal was caught: cleaning up: 2
> 3: ARMCI aborting 0 (0).
> 3: ARMCI aborting 0 (0).
> system error message: Illegal seek
> Last System Error Message from Task 0:: Inappropriate ioctl for device
> 4:4:fehler:: 1010707757
> (rank:4 hostname:nodeib_08 pid:32596):ARMCI DASSERT fail.
> armci.c:ARMCI_Error():260 cond:0
> 4:SigIntHandler: interrupt signal was caught: 2
> 4:SigIntHandler: abort signal was caught: cleaning up: 2
> 4: ARMCI aborting 0 (0).
> 4: ARMCI aborting 0 (0).
> system error message: Transport endpoint is not connected
> WaitAll: Child (4471) finished, status=0x100 (exited with code 1).
> WaitAll: Child (4470) finished, status=0x100 (exited with code 1).
> WaitAll: Child (4469) finished, status=0x100 (exited with code 1).
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
More information about the Molpro-user
mailing list