[molpro-user] Job termination during Pipek-Mezey Localisation
Rika Kobayashi
Rika.Kobayashi at anu.edu.au
Wed Dec 20 10:28:52 CET 2017
Hello,
Signal 9 usually indicates to us that the job was killed from exceeding
memory and the logs indeed show:
12/19/2017 20:04:20;0008;pbs_python;Job;2169493.r-man2;Cgroup memory limit
exceeded: Killed process
24728 (molpro.exe) total-vm:17508688kB, anon-rss:9439868kB,
file-rss:4816kB, shmem-rss:264kB
This indicates a sudden memory hike that killed the job before it had a
chance to get logged by PBS.
Rika
On 19 December 2017 at 21:10, Seth Olsen <seth.olsen at uq.edu.au> wrote:
> Hi Molpro-User,
>
> I’ve been having a job fail during orbital localization. It is a CASSCF
> (8 electron in 8 orbital) job. The output ends abruptly:
> ***********************************************************
> ***********************************************************************
>
>
> Program * Orbital Localization Authors: W. Meyer, H.-J. Werner
>
> Pipek-Mezey Localization
>
> Molecular orbitals read from record 2141.2 Type=MCSCF/NATURAL
> Density matrix read from record 2141.2 Type=MCSCF/CHARGE (state
> averaged)
>
> …but the standard output has a little more description
>
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 4 DUP
> FROM 0
> with errorcode 15.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 0:Terminate signal was sent, status=: 15
> (rank:0 hostname:r2398 pid:24716):ARMCI DASSERT fail.
> src/common/signaltrap.c:SigTermHandler():477 cond:0
> --------------------------------------------------------------------------
> mpirun noticed that process rank 7 with PID 24730 on node r2398 exited on
> signal 9 (Killed).
> --------------------------------------------------------------------------
>
> ============================================================
> ==========================
> Resource Usage on 2017-12-19 20:04:32:
> Job Id: 2169493.r-man2
> Project: tv58
> Exit Status: 0
> Service Units: 0.55
> NCPUs Requested: 8 NCPUs Used: 8
> CPU Time Used: 00:24:10
>
> Memory Requested: 48.0GB Memory Used: 31.96GB
> Walltime requested: 03:00:00 Walltime Used: 00:04:09
> JobFS requested: 150.0GB JobFS used: 941.47MB
> ============================================================
> ==========================
>
> Any ideas? Anyone seen this before?
>
> Many Thanks,
> Seth
> ===========================
> Seth Olsen, PhD.
> Honorary Fellow
> School of Mathematics & Physics
> The University of Queensland
> QLD 4072 Australia
> Ph: +61 7 3365 2816 <+61%207%203365%202816>
> ===========================
> A PGP public key for this address has been uploaded to the key servers.
>
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20171220/a1f625ef/attachment.html>
More information about the Molpro-user
mailing list