[molpro-user] computational scaling of CCSD(T) calculations

Wed Nov 9 13:03:28 GMT 2011

Hi all,

I have recently installed molpro 2010.1 on my cluster, and I'm having 
trouble grasping the computational scaling characteristics of CCSD(T) 
calculations. I'm running the precompiled binaries obtained from the 
website (Version 2010.1 linked 15 Sep 2011 12:01:52). I am not used to 
doing this type of calculation in molpro, so this could be a newbie 
mistake.

The test CCSD(T) calculations I'm trying involve a 1-butoxy radical 
(C4H9O), with increasing basis set: aug-cc-pVDZ, aug-cc-pVTZ and 
aug-cc-pVQZ. The DZ calculation requires a minimum of a few hundred MB, 
the TZ runs happily in a few GB, and the QZ needs a minimum of about 
18GB for the triples to run. Anything less, it simply aborts, 
recommending increasing the variable memory. I assume a aug-cc-pV5Z 
basis set would require dozens of GB of main memory, aug-cc-pV6Z 
hundreds of GB.

So, as far as I can tell, on e.g. a machine with 16 or 32GB of memory I 
can use all cores for the DZ calculations, 3 cores only for the TZ 
calculations (as only three processes fit into the main memory), and 
only 1 core for the QZ calculations (as only one 18GB process fits). The 
bigger the calculations, the less cores I can use, though I use the same 
total amount of memory on a node. The disk use is very moderate, just 
under 250GB. Continuing this trend means that I will never be able to 
increase the basis set beyond a certain limit, while I would expect 
bigger calculations would just result in more disk activity (hence 
slower, less efficient calculations), but would still be able to use all 
cores available.

Trying to add more nodes to the calculation to get around the memory 
restriction did not help: the minimum memory requirement per process 
does not decrease with adding machines, and adding more machines (and 
hence more total memory) therefore does not allow me to do bigger 
calculations. E.g. running the QZ calculation over 6 machines still 
requires 18GB per process, despite now having 6 times more total memory 
available. The disk use per machine decreases by about a factor of 6 to 
48GB, very small compared to the minimum 18GB per-process memory 
requirement. I haven't tried 10+ nodes, but the above suggests that at 
some point, the per machine disk requirement scales below the 
non-scaling per-process memory requirement, which I find counterintuitive.

I am missing something here ? Is the size of a molpro CCSD(T) 
calculation predominantly limited by the size of the maximum memory in a 
single machine, and adding more per-machine cores, per-machine disk, 
global cores, global disk and global memory do not help? My main concern 
is initially being able to do certain types of calculations, 
irrespective of their computational efficiency or needed wall time. 
Clusters tend to grow with respect to the number of machines, but much 
slower with respect to the size of the individual machines.

Cheers,
Luc Vereecken