[molpro-user] MOLPRO on dual-core AMD Opteron
Gershom (Jan M.L.) Martin
comartin at wicc.weizmann.ac.il
Wed Nov 16 11:53:43 GMT 2005
Greetings:
Following up on my earlier request, a kind soul here provided me with
root access to a demo machine with four Opteron 875s (2.2 GHz, dual
core) and 16 GB of 400 MHz DDR2 RAM. Red Hat Advanced Server 4 was
installed on it.
The benchmark I ran was HClO4 CCSD(T)/aug-cc-pV(T+d)Z single-point
energy (in Cs symmetry).
From the viewpoint of Linux, the cores appear as eight independent
CPUs, divided into four nodes. One can lock jobs onto individual
*nodes* by running the job as follows:
numactl --cpubind=0,1 molpro -n 4 -m 150000000
testjob-2cpus-4cores.com &
but there is no way (at least not that I could see) to bind processes
to specific *cores*. So, in order to compare N dual-core with 2 N
single-core CPUs, I used a somewhat "dirty" trick: I wrote a simple
program that executes an endless loop of integer multiplies, saved it
as "block1core", and ran
numactl --cpubind=0 block1core &
numactl --cpubind=1 block1core &
numactl --cpubind=2 block1core &
numactl --cpubind=3 block1core &
which effectively leaves me with what amounts to exactly the same
machine but with four Opteron 848s (the single-core equivalent of the
875).
The results (CPU times in seconds as reported in MOLPRO's output):
#CPUs #cores t[xform] time[CCSD] (a) time[(T)]
1 1 82.74 2399 2316 3132
2 2 54.40 1251 1197 1509
1 2 76.34 1308 1232 1593
3 3 53.14 924 871 1048
4 4 47.73 735 687 823
2 4 55.43 728 672 804
3 6 53.45 538 484 540
4 8 46.43 478 432 461
(a) = CCSD minus transformation
There may be some slight measurement errors here, as well as some
fluctuation because of other processes running (although the machine
was basically empty otherwise), but the bottom line appears to be:
* N dual-core CPUs yield only slightly less performance than 2N
equivalent single-core CPUs. Presumably this will deteriorate
with larger and more memory-intensive jobs, but clearly the
performance penalty from the shared memory access channel of the
two cores isn't nearly as bad as I feared
* (T) in this job size range parallelizes nearly perfectly with the
number of cores (not just CPUs) up to about 6 of them
* even CCSD in this job size range still parallelizes with about 80%
efficiency over 6 cores (if the transformation step is taken out of
the total figures) and with 2/3 efficiency over 8 cores.
Any comments, observations, welcome :-)
Best regards,
JMLM
------------------------------------------------------------------------
Gershom (Jan M.L.) Martin / Baroness Thatcher Professor of Chemistry
Member, Lise Meitner-Minerva Center for Computational Quantum Chemistry
and Helen and Martin Kimmel Center for Molecular Design
Dept. of Organic Chemistry / Weizmann Institute of Science
Kimmelman Bldg., Room 252 / 76100 Rechovot, Israel
Email: comartin at wicc.weizmann.ac.il
Phone: +972 8 9342533 (office), +972 54 4631676 (mobile)
FAX: +972 8 9344142 (dept.), +972 8 9342621 (direct to computer)
Web: http://theochem.weizmann.ac.il
------------------------------------------------------------------------
More information about the Molpro-user
mailing list