====== Post-processing of output and databases ======
+
+
===== Output =====
+
+
Molpro produces an output file that is in XML format with appropriate mark-up for all important results. This file, which has a ''.xml'' suffix, conforms strictly to a [[https://www.molpro.net/schema/molpro-output.xsd|well-defined schema]], and the the schema definition file can be found in the main Molpro source or installation tree in the directory ''%%lib/schema%%''. The principal elements of marked-up output are
+
+
* **jobstep** The results from one job step.
+
* **molecule** A container for data on a single molecule.
+
* **cml:molecule** Molecular geometry in the Chemical Markup Language (CML) format.
+
* **property** A computed property, for example an energy or dipole moment.
+
* **table** The output of Molpro’s ''TABLE'' command in XHTML format.
+
* **basisSet** A self-contained description of the orbital basis set.
+
* **orbitals** A set of orbitals.
+
* **vibrations** Harmonic normal vibrational modes.
+
* **variables** Molpro’s internal variables.
+
* **platform** Information about the computing system on which a job was run.
+
+
Not all of these elements are produced by default in the regular job transcript ''.xml'' file; some of them can result from using the ''%%PUT,XML%%'' to make a separate dump file.
+
+
''molpro-output'' is understood by several post-processing programs, including [[https://jmol.org|Jmol]], [[https://gitlab.com/molpro/sjef|sjef]] and gmolpro.
+
+
The following gives an example of extracting and manipulating the details of the wavefunction using Python.
primitivesc[icomp,ia,ip] = value * xyz[0]**k * xyz[1]**l * xyz[2]**m/math.sqrt(normfac[icomp+lqbase])
+
+
# transformation to spherical harmonics
+
if molecule.xpath('//molpro-output:orbitals',namespaces=namespaces)[0].get('angular')=='spherical': # Molpro 2012.1 does not produced this, but cartesian only. The spherical code here is not finished, but not presently needed
+
ncomp=2*lquant+1
+
else:
+
ncomp=ncompc
+
if ncomp < ncompc and lquant >= len(sphtran):
+
raise Exception("Spherical functions not yet coded")
print('... nearest to result ',closest,' which differs by ',closeness)
+
+
+
print('End of molecule '+molecule.get('id'))
+
</code>
+
+
+
===== Databases =====
+
+
A facility is provided to store and interrogate sets of molecules, together with information about how they are to be combined in balanced chemical equations. This collection of information is referred to as a //database//, and can be generated completely manually, or partially by running appropriate Molpro calculations. Analysis of the database can give a summary of the energy changes associated with each described reaction, and two or more similar databases can be compared reaction by reaction, to give a statistical analysis of the differences between them.
+
+
All of the files associated with the database facility can be found in the directory ''database'' in the main Molpro source or installation tree.
+
+
==== Description and specification of databases ====
+
+
A database is an XML file conforming to the ''molpro-database'' schema, and consists one or more occurrences of each of the following two principal elements.
+
+
* **molecule** Information about a single molecular species in the ''molpro-output'' XML format. This will usually be the result of ''%%PUT,XML%%'' in a Molpro calculation, but can also be constructed directly from an external data source. The important quantities that are used are the geometry and energy, together with metadata such as the method and basis set, and other quantities such as spin and symmetry that might be useful for constructing a new Molpro job for the molecule.
+
* **reaction** A list of ''species'' specifications that point uniquely to one of the ''molecule'' nodes, together with information on how the species appears stoichimetrically in the reaction, and whether it is a special point such as a transition state. ''species'' specifications can also be given without either of these tags, allowing additional geometries, for example along a reaction coordinate or potential surface cut, to be included.
+
+
Normally, the ''molecule'' nodes will be in separate self-contained files that are then referenced in the main database file through the syntax of [[http://www.w3.org/TR/xinclude|XInclude]]. There are three reasons for this. Firstly, these files can be produced directly by a Molpro calculation, with the rest of the database being constructed by hand. Secondly, they allow the possibility that the molecule files be replaced in the future by, for example, running all the molecule calculations again using a different method; in that case, the rest of the database, i.e. the reaction specifications, does not need to change. This supports the possibility of having several databases that have the same structure – specification of reactions – but different numerical data, and therefore being capable of numerical comparison. Thirdly, several databases can coexist in the same directory, and share some of the same molecule files. An example of this is a supplementary database that consists of a subset of the reactions contained in the main database.
+
+
The following is an example of a complete database of four reactions involving the species O,H2,H2O,H2O2O,H2,H2O,H2O2 and CH2OCH2O. Note that the association between the ''species'' and the ''%%molpro-output:molecule%%'' nodes is achieved through the use of [[http://www.iupac.org/home/publications/e-resources/inchi.html|InChI]] tags, which ''%%PUT,XML%%'' will produce provided that [[http://openbabel.org|OpenBabel]] is installed on the system. An alternative is through syntax such as
+
+
<code>
+
{PUT,XML,file.xml; index,73}
+
</code>
+
and the use of ''%%<species index="73">%%'' in the database file. Note that sometimes different species have the same InChI, and so the use of ''index'' is necessary to resolve ambiguities.
+
+
<code xml database/sets/examples/reactions/reactions.xml>
For full specification of the possible structure of a database, see the schema file
+
+
==== Interrogation and manipulation of databases ====
+
+
The directory ''%%database/utilities%%'' contains several Python scripts that manipulate databases. For convenience, they can be run through the script
so long as Python (version 3 preferred) is installed on the system. You need the [[http://lxml.de|lxml]] and [[http://docs.python-requests.org|requests]] package included in your Python installation:
+
+
<code>
+
pip install lxml requests
+
</code>
+
(or ''pip3'' if you are using Python 3).
+
+
The script ''validate'' checks whether a database conforms to the schema, for example
+
+
<code>
+
cd Molpro # assuming below that we are in Molpro source tree, but works from anywhere
+
bin/molpro --database validate \
+
database/sets/examples/reactions/reactions.xml
+
</code>
+
=== Computation of new database data ===
+
+
The script ''clone'' takes an existing database, for which the file name should be provided as an argument, and generates a set of Molpro jobs that will run the same method on each of the molecules, with the end result that a new database is created. If the database master file is the only one in the directory that declares itself to belong to the ''molpro-database'' schema, then you can just give the directory name as the argument to this and other scripts instead. In addition, if the database master file has the suffix ''.xml'', the suffix does not need to be specified. For the above example, this could be
+
+
<code>
+
cd Molpro # assuming below that we are in Molpro source tree, but works from anywhere
+
bin/molpro --database clone \
+
database/sets/examples/reactions
+
</code>
+
This will create a new directory ''reactions.d'' with the following contents.
+
+
<code>
+
original/ reactions.xml runall/
+
procedures.molpro run/
+
+
reactions.d/original:
+
co.xml h2co.xml h2o.xml o.xml
+
h2.xml h2cots.xml h2o2.xml o2.xml
+
+
reactions.d/run:
+
co.molpro h2co.molpro h2o.molpro o.molpro
+
h2.molpro h2cots.molpro h2o2.molpro o2.molpro
+
+
reactions.d/runall:
+
reactions.molpro
+
</code>
+
The file ''procedures.molpro'' contains a procedure that will be run on every molecule, and it should be edited to use the desired methods. Then the calculations can be run, either via each of the individual Molpro input files in ''%%run/%%'', or the single input file ''reactions.molpro'' in the directory ''runall''. Once these jobs have completed, then the directory contains a complete database with the original reaction scheme but new data.
+
+
=== Analysis of databases ===
+
+
<code>
+
cd Molpro
+
bin/molpro --database analyse \
+
database/sets/examples/reactions/reactions.xml
+
</code>
+
will analyse the database, and report the energy change for each described reaction. If two or more databases are given as arguments, the analysis will be done on each, and also on the difference between the first database and the second and any subsequent, including a statistical summary. For the example given above, one might say
+
+
<code>
+
cd Molpro/reactions.d
+
../bin/molpro --database analyse original .
+
</code>
+
''analyse'' has a number of options that are described by running
+
+
<code>
+
molpro --database analyse --help
+
</code>
+
==== Library of databases ====
+
+
The directory ''%%database/sets%%'' contains several standard databases. Within each one is a description of its origin, contents and purpose. The scripts described above can take these databases as inputs, for example ''%%database/sets/examples/reactions/reactions.xml%%''; as a shortcut, one could simply instead use ''%%examples/reactions%%'' which will find the system database irrespective of the current working directory.