[molpro-user] molpro's job sudden death
Manhui Wang
wangm9 at cardiff.ac.uk
Wed Apr 21 09:30:50 BST 2010
Ron,
Your question is actually related to two issues:
(1) One can't request memory which exceeds the hard limit on the machine.
(2) One specific molpro job requests a certain of memory in order to
being run successfully. Please refer to the previous discussion:
http://www.molpro.net/pipermail/molpro-user/2010-March/003674.html
Ronald Kasl wrote:
> * this is what we have
> root at currituck:~# cat /proc/meminfo
> MemTotal: 74247436 kB
>
> * would you please suggest what would be the maximum they can specify ?
When running with 8 processes, it would be safe to specify about 1100
MWord in the memory directive of Molpro input on your machine if swap
memory is not taken into account.
> * they said that they tried 80 MWords, but it died as well
This might be related to the second issue I mentioned.
>
> * also please have in mind that this is regular Linux box -- (not a
> cluster)-- this is not a box with distributed memory across nodes
>
> thanks!!
> ron
>
>
Best wishes,
Manhui
>
>
>
>
> Manhui Wang wrote:
>> From the output:
>>
>> Nodes nprocs
>> currituck 6
>> .....
>> memory,7500,M
>>
>> it means it might request 7500 MWord * 8 *6 = 360000MB = 360 GB of
>> memory in total. Could you check how much memory do you have on the machine?
>>
>> Best wishes,
>> Manhui
>>
>>
>> Ronald Kasl wrote:
>>
>>> Thanks, .. the computational chemists told me that they are aware of
>>> that and that they tried different amount of memory, but the output was
>>> the same.
>>>
>>> ... is there any chance that you can patch the code that it shows by how
>>> much the memory needs to be increased --- this is what we get now .. the
>>> guys said that changing one line in the code would fix it , they don't
>>> want to bother you with it, but I thought that you may want to check on
>>> that
>>>
>>> ** this is what it shows in the output file (see the attachment for more)
>>> ......
>>>
>>> For full I/O caching in triples, increase memory by********* words
>>> to****** Mword
>>>
>>> **
>>>
>>> Thanks,
>>> Ron
>>>
>>>
>>>
>>> Manhui Wang wrote:
>>>
>>>> Please be aware that the memory directive in the input is in Word (not
>>>> Byte) per process.
>>>> For examples the line in your input
>>>> memory,500,M
>>>> means you might request 500 MWord of memory per process. When you run it
>>>> with 8 processes, it might request 500 * 8 *8 = 32GB of memory. If
>>>> memory allocation in parallel exceeds the total limit, please try
>>>> reducing the memory or reducing the number of processes.
>>>>
>>>> Best wishes,
>>>> Manhui
>>>>
>>>> psc wrote:
>>>>
>>>>
>>>>> Good morning, by any chance does anybody have any experiences with
>>>>> sudden death of molpro? On our place this happen when runs on 8 cores in
>>>>> machine with 2*4 cores machine? It runs fine for awhile, but then
>>>>> suddenly dies ... before the job dies, the machine still have enough
>>>>> memory and the disk is only 32% filled. Do you have any clues of what
>>>>> is happening? How do you troubleshoot such problems with molpro? The
>>>>> computational chemist tried to run same job on 4 cores and the job runs
>>>>> just fine.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> This is the last portion of the output file:
>>>>>
>>>>> DF-MP2-F12 correlation energies:
>>>>> --------------------------------
>>>>> Approx. Singlet Triplet
>>>>> Ecorr Total Energy
>>>>> DF-MP2 -2.105468770835
>>>>> -1.481892024291 -3.587360795125 -1241.433614075391
>>>>> DF-MP2-F12/3*C(DX,FIX) -3.180235173011
>>>>> -1.762556768679 -4.942791941690 -1242.789045221956
>>>>> DF-MP2-F12/3*C(FIX) -3.079029486269
>>>>> -1.791231138096 -4.870260624365 -1242.716513904631
>>>>> DF-MP2-F12/3C(FIX) -3.076495219986
>>>>> -1.793891105189 -4.870386325175 -1242.716639605441
>>>>>
>>>>> SCS-DF-MP2 energies (F_SING= 1.20000 F_TRIP= 0.62222 F_PARALLEL=
>>>>> 0.33333):
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>>
>>>>> SCS-DF-MP2 -3.448628673449 -1241.294881953715
>>>>> SCS-DF-MP2-F12/3*C(DX,FIX) -4.912984197013 -1242.759237477279
>>>>> SCS-DF-MP2-F12/3*C(FIX) -4.809379202782 -1242.655632483048
>>>>> SCS-DF-MP2-F12/3C(FIX) -4.807993173879 -1242.654246454144
>>>>>
>>>>> Symmetry transformation completed.
>>>>>
>>>>> Number of N-1 electron functions: 63
>>>>> Number of N-2 electron functions: 2016
>>>>> Number of singly external CSFs: 19467
>>>>> Number of doubly external CSFs: 189491778
>>>>> Total number of CSFs: 189511246
>>>>>
>>>>> Pair and operator lists are different
>>>>>
>>>>> Length of J-op integral file: 163.14 GB
>>>>> Length of K-op integral file: 113.78 GB
>>>>> Length of 3-ext integral record: 0.00 MB
>>>>>
>>>>> Memory could be reduced to2370.6 Mword without degradation in triples
>>>>>
>>>>>
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> Image PC Routine Line Source
>>>>> molprop_2009_1_Li 000000000262888F Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 00000000025FFB96 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000219B509 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000219C545 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000219F48D Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000171D1F7 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 00000000017184C5 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 00000000004BAD99 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 00000000004B5AE5 Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000043DD2C Unknown Unknown Unknown
>>>>> libc.so.6 00007F91CAF48ABD Unknown Unknown Unknown
>>>>> molprop_2009_1_Li 000000000043DC29 Unknown Unknown Unknown
>>>>> [0]0:Return code = 0, signaled with Killed
>>>>> [0]1:Return code = 1
>>>>> [0]2:Return code = 1
>>>>> [0]3:Return code = 1
>>>>> [0]4:Return code = 1
>>>>> [0]5:Return code = 1
>>>>> [0]6:Return code = 1
>>>>> [0]7:Return code = 1
>>>>>
>>>>> _______________________________________________
>>>>> Molpro-user mailing list
>>>>> Molpro-user at molpro.net
>>>>> http://www.molpro.net/mailman/listinfo/molpro-user
>>>>>
>>>>>
>>>>
>>>>
>>
>
--
-----------
Manhui Wang
School of Chemistry, Cardiff University,
Main Building, Park Place,
Cardiff CF10 3AT, UK
Telephone: +44 (0)29208 76637
More information about the Molpro-user
mailing list