[molpro-user] molpro's job sudden death

Manhui Wang wangm9 at cardiff.ac.uk
Wed Apr 21 09:30:50 BST 2010


Ron,
    Your question is actually related to two issues:
(1) One can't request memory which exceeds the hard limit on the machine.
(2) One specific molpro job requests a certain of memory in order to
being run successfully. Please refer to the previous discussion:
    http://www.molpro.net/pipermail/molpro-user/2010-March/003674.html

Ronald Kasl wrote:
> * this is what we have
> root at currituck:~# cat /proc/meminfo
> MemTotal:       74247436 kB
> 
> * would you please suggest what would be the maximum they can specify ?
When running with 8 processes, it would be safe to specify about 1100
MWord in the memory directive of Molpro input on your machine if swap
memory is not taken into account.
> * they said that they tried 80 MWords, but it died as well
This might be related to the second issue I mentioned.
> 
> * also please have in mind that this is regular Linux box -- (not a
> cluster)--  this is not a box with distributed memory across nodes
> 
> thanks!!
> ron
> 
> 

Best wishes,
Manhui
> 
> 
> 
> 
> Manhui Wang wrote:
>> From the output:
>>
>>  Nodes     nprocs
>>  currituck    6
>> .....
>>  memory,7500,M
>>
>> it means it might request 7500 MWord * 8 *6 = 360000MB = 360 GB of
>> memory in total. Could you check how much memory do you have on the machine?
>>
>> Best wishes,
>> Manhui
>>
>>
>> Ronald Kasl wrote:
>>   
>>> Thanks, .. the computational chemists told me that they are aware of
>>> that and that they tried different amount of memory, but the output was
>>> the same.
>>>
>>> ... is there any chance that you can patch the code that it shows by how
>>> much the memory needs to be increased --- this is what we get now .. the
>>> guys said that changing one line in the code would fix it , they don't
>>> want to bother you with it, but I thought that you may want to check on
>>> that
>>>
>>> ** this is what it shows in the output file (see the attachment for more)
>>> ......
>>>
>>>  For full I/O caching in triples, increase memory by********* words
>>> to****** Mword
>>>
>>> **
>>>
>>> Thanks,
>>> Ron
>>>
>>>
>>>
>>> Manhui Wang wrote:
>>>     
>>>> Please be aware that the memory directive in the input is in Word (not
>>>> Byte) per process.
>>>> For examples the line in your input
>>>> memory,500,M
>>>> means you might request 500 MWord of memory per process. When you run it
>>>> with 8 processes, it might request 500 * 8 *8 = 32GB of  memory. If
>>>> memory allocation in parallel exceeds the total limit, please try
>>>> reducing the memory or reducing the number of processes.
>>>>
>>>> Best wishes,
>>>> Manhui
>>>>
>>>> psc wrote:
>>>>   
>>>>       
>>>>> Good morning, by any chance does anybody have any experiences with
>>>>> sudden death of molpro? On our place this happen when runs on 8 cores in
>>>>> machine with 2*4 cores machine? It runs fine for awhile, but then
>>>>> suddenly dies ... before the job dies, the machine still have enough
>>>>> memory and the disk is only 32% filled.  Do you have any clues of what
>>>>> is happening? How do you troubleshoot such problems with molpro? The
>>>>> computational chemist tried to run same job on  4 cores and the job runs
>>>>> just fine.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> This is the last portion of the output file:
>>>>>
>>>>>  DF-MP2-F12 correlation energies:
>>>>>  --------------------------------
>>>>>  Approx.                                    Singlet             Triplet
>>>>> Ecorr            Total Energy
>>>>>  DF-MP2                                -2.105468770835    
>>>>> -1.481892024291 -3.587360795125  -1241.433614075391
>>>>>  DF-MP2-F12/3*C(DX,FIX)                -3.180235173011    
>>>>> -1.762556768679 -4.942791941690  -1242.789045221956
>>>>>  DF-MP2-F12/3*C(FIX)                   -3.079029486269    
>>>>> -1.791231138096 -4.870260624365  -1242.716513904631
>>>>>  DF-MP2-F12/3C(FIX)                    -3.076495219986    
>>>>> -1.793891105189 -4.870386325175  -1242.716639605441
>>>>>
>>>>>  SCS-DF-MP2 energies (F_SING= 1.20000  F_TRIP= 0.62222  F_PARALLEL=
>>>>> 0.33333):
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>>
>>>>>  SCS-DF-MP2                            -3.448628673449  -1241.294881953715
>>>>>  SCS-DF-MP2-F12/3*C(DX,FIX)            -4.912984197013  -1242.759237477279
>>>>>  SCS-DF-MP2-F12/3*C(FIX)               -4.809379202782  -1242.655632483048
>>>>>  SCS-DF-MP2-F12/3C(FIX)                -4.807993173879  -1242.654246454144
>>>>>
>>>>>  Symmetry transformation completed.
>>>>>
>>>>>  Number of N-1 electron functions:              63
>>>>>  Number of N-2 electron functions:            2016
>>>>>  Number of singly external CSFs:             19467
>>>>>  Number of doubly external CSFs:         189491778
>>>>>  Total number of CSFs:                   189511246
>>>>>
>>>>>  Pair and operator lists are different
>>>>>
>>>>>  Length of J-op  integral file:             163.14 GB
>>>>>  Length of K-op  integral file:             113.78 GB
>>>>>  Length of 3-ext integral record:             0.00 MB
>>>>>
>>>>>  Memory could be reduced to2370.6 Mword without degradation in triples
>>>>>
>>>>>
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> forrtl: error (69): process interrupted (SIGINT)
>>>>> Image              PC                Routine            Line        Source
>>>>> molprop_2009_1_Li  000000000262888F  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  00000000025FFB96  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000219B509  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000219C545  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000219F48D  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000171D1F7  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  00000000017184C5  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  00000000004BAD99  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  00000000004B5AE5  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000043DD2C  Unknown               Unknown  Unknown
>>>>> libc.so.6          00007F91CAF48ABD  Unknown               Unknown  Unknown
>>>>> molprop_2009_1_Li  000000000043DC29  Unknown               Unknown  Unknown
>>>>> [0]0:Return code = 0, signaled with Killed
>>>>> [0]1:Return code = 1
>>>>> [0]2:Return code = 1
>>>>> [0]3:Return code = 1
>>>>> [0]4:Return code = 1
>>>>> [0]5:Return code = 1
>>>>> [0]6:Return code = 1
>>>>> [0]7:Return code = 1
>>>>>
>>>>> _______________________________________________
>>>>> Molpro-user mailing list
>>>>> Molpro-user at molpro.net
>>>>> http://www.molpro.net/mailman/listinfo/molpro-user
>>>>>     
>>>>>         
>>>>   
>>>>       
>>   
> 

-- 
-----------
Manhui  Wang
School of Chemistry, Cardiff University,
Main Building, Park Place,
Cardiff CF10 3AT, UK
Telephone: +44 (0)29208 76637



More information about the Molpro-user mailing list