[molpro-user] restarting job on cluster node

Peter Ruprecht ruprech at jilau1.colorado.edu
Wed Dec 29 21:45:52 GMT 2010


Dear Molpro community,

(Disclaimer: I'm not much of a Molpro user, just the system 
administrator, so please take that into consideration if replying ;)

One of my users had a long-running calculation that was killed after 100 
days upon hitting our cluster's walltime limit.  Since the job was very 
close to finishing at that point, we wanted to be able to restart it 
where it left off.  So, the user resubmitted the identical input file to 
the same cluster node where the job had been running.  On this node, the 
.wfu file and various other temp files from the original job were still 
waiting in the local $TMPDIR.  Our understanding is that Molpro is smart 
enough to pick up again where the original calculation was killed.

However, it looks like the calculation has started over from the 
beginning.  Any ideas what we might have done wrong?  The input file and 
output file from the attempted restart are attached.  (Don't be confused 
by the fact that the input file has "restart" in it; so did the original 
job, which had been restarted in a different way from still another 
previous job.)

If my explanation is not clear enough or you need more info, please let 
me know.  Any help at all toward not losing 100 days of runtime will be 
gratefully appreciated!

Thanks,
Peter Ruprecht
JILA / University of Colorado
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mal_vtzf12_irc_full_restart.25495DEFANGED-com
Type: application/defanged
Size: 1476 bytes
Desc: not available
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20101229/8a2c60f6/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mal_vtzf12_irc_full_restart.out
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20101229/8a2c60f6/attachment.ksh>


More information about the Molpro-user mailing list