[molpro-user] Permanent installation of dependencies (was: Non-reproducible stuck state when running Molpro on NFS drive)
Gregory Magoon
gmagoon at MIT.EDU
Sun Jul 24 04:12:19 BST 2011
After some work I was finally able to trace this to some sort of issue between
NFSv4 and MPICH2; I can get this to work properly when I mount the NFS drives
as NFSv3 (as opposed to NFSv4), so the issue is now more-or-less resolved.
A quick follow-up question: Is there a recommended approach for permanent
installation of the mpich2 dependency (and maybe also GA?) when using the auto
build approach? By default, it seems that the installation scripts leave the
mpiexec in the compile directory. I saw that the makefile/installation scripts
mention an option called REDIST which seemed like it might allow this,
but they
don't seem to make use of this option (REDIST=NO).
Thanks,
Greg
Quoting Gregory Magoon <gmagoon at MIT.EDU>:
> Hi,
> I have successfully compiled molpro (with Global Arrays/TCGMSG; mpich2 from
> Ubuntu package) on one of our compute nodes for our new server, and installed
> it in an NFS directory on our head node. The initial tests on the
> compute node
> ran fine but since the installation, I've had issues with running
> molpro on the
> compute nodes (it seems to work fine on the head node). Sometimes
> (sorry I can't
> be more precise, but it does not seem to be reproducible), when
> running on the
> compute node, the job will get stuck in the early stages, producing a
> lot (~14+
> Mbps outbound to headnode and 7Mbps inbound from headnode) of NFS traffic and
> causing fairly high nfsd process CPU% usage on the head node. Molpro
> processes
> in the stuck state are shown in "top" command display at the bottom of the
> e-mail. I have also attached example verbose output for a case that
> works and a
> case that gets stuck.
>
> Some notes:
> -/usr/local is mounted as NFS read-only file system; /home is mounted
> as NFS rw
> file system
> -It seems like runs with fewer processors (e.g. 6) are more likely to run
> successfully
>
> I've tried several approaches for addressing the issue, including 1. Mounting
> /usr/local as rw file system, and 2. Changing the rsize and wsize parameters
> for the NFS filesystem but none seem to work. We also tried piping <
> /dev/null
> when calling the process, which seemed like it was helping at first,
> but later
> tests suggested that this wasn't actually helping.
>
> If anyone has any tips or ideas to help diagnose the issue here, it would be
> greatly appreciated. If there are any additional details I can
> provide to help
> describe the problem, I'd be happy to provide them.
>
> Thanks very much,
> Greg
>
> Top processes in "top" output in stuck state:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 10 root 20 0 0 0 0 S 10 0.0 0:16.50 kworker/0:1
> 2 root 20 0 0 0 0 S 6 0.0 0:10.86 kthreadd
> 1496 root 20 0 0 0 0 S 1 0.0 0:04.73 kworker/0:2
> 3 root 20 0 0 0 0 S 1 0.0 0:00.93 ksoftirqd/0
>
> Processes in "top" output for user in stuck state:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 29961 user 20 0 19452 1508 1072 R 0 0.0 0:00.05 top
> 1176 user 20 0 91708 1824 868 S 0 0.0 0:00.01 sshd
> 1177 user 20 0 24980 7620 1660 S 0 0.0 0:00.41 bash
> 1289 user 20 0 91708 1824 868 S 0 0.0 0:00.00 sshd
> 1290 user 20 0 24980 7600 1640 S 0 0.0 0:00.32 bash
> 1386 user 20 0 4220 664 524 S 0 0.0 0:00.01 molpro
> 1481 user 20 0 18764 1196 900 S 0 0.0 0:00.00 mpiexec
> 1482 user 20 0 18828 1092 820 S 0 0.0 0:00.00 hydra_pmi_proxy
> 1483 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1484 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1485 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1486 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1487 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1488 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1489 user 20 0 18860 488 212 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1490 user 20 0 18860 488 208 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1491 user 20 0 18860 488 208 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1492 user 20 0 18860 488 208 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1493 user 20 0 18860 488 208 D 0 0.0 0:00.00 hydra_pmi_proxy
> 1494 user 20 0 18860 492 212 D 0 0.0 0:00.00 hydra_pmi_proxy
>
>
>
>
More information about the Molpro-user
mailing list