[molpro-user] MPI parallel jobs over TCP/IP
Kirk Peterson
kipeters at wsu.edu
Mon Nov 7 22:02:31 GMT 2005
Hi,
I'm hoping someone has run into this problem too and has found where
it lies. We have a small Opteron cluster consisting of 5 dual-
processor nodes with a simple GigE network. While parallel Molpro
built with tcgmsg works ok as long as we run just 2-way parallel on a
single node, we often run into problems if we run large jobs across
nodes. To perhaps bypass this, we wanted to build a version of
Molpro using an MPI implementation. With the latest GA tools (3-4b)
and MPICH (1.2.7p1), the standard Molpro testjobs work just fine.
The problems occurs for large open-shell CCSD(T) jobs where the
amount of GA memory gets large (~500 MB). (Note that large MRCI jobs
seem to work fine.) For example, if we modify the standard molpro
benchmark normal_ccsd.com by removing the MP4 step and replacing CCSD
by UCCSD(T) and then run this across 2 nodes, the CCSD energy is
correct, but the contribution due to triples is completely wrong by
many mEh. I've tried the same job using a myrinet-based Opteron
cluster (similar build, but of course the myrinet-based mpich
software) and it worked just fine, so it's not anything intrinsic to
an Opteron (I think).
I'd really appreciate any help.
best regards,
Kirk
--------------------------------------------
Kirk A. Peterson
Professor of Chemistry and Materials Science
Washington State University
Pullman, WA 99164-4630
Office: (509) 335-7867
Fax: (509) 335-8867
kipeters at wsu.edu
http://tyr0.chem.wsu.edu/~kipeters/
------------------------------------------------------------------------
More information about the Molpro-user
mailing list