Differences

This shows you the differences between two versions of the page.

--- post-processing_of_output_and_databases [2020/08/11 10:56] – change path for ease of extraction for regression testing may
+++ post-processing_of_output_and_databases [2024/07/12 08:37] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+====== Post-processing of output and databases ======
+===== Output =====
+Molpro produces an output file that is in XML format with appropriate mark-up for all important results. This file, which has a ''.xml'' suffix, conforms strictly to a [[https://www.molpro.net/schema/molpro-output.xsd|well-defined schema]], and the the schema definition file can be found in the main Molpro source or installation tree in the directory ''%%lib/schema%%''. The principal elements of marked-up output are
+  * **jobstep** The results from one job step.
+  * **molecule** A container for data on a single molecule.
+  * **cml:molecule** Molecular geometry in the Chemical Markup Language (CML) format.
+  * **property** A computed property, for example an energy or dipole moment.
+  * **table** The output of Molpro’s ''TABLE'' command in XHTML format.
+  * **basisSet** A self-contained description of the orbital basis set.
+  * **orbitals** A set of orbitals.
+  * **vibrations** Harmonic normal vibrational modes.
+  * **variables** Molpro’s internal variables.
+  * **platform** Information about the computing system on which a job was run.
+Not all of these elements are produced by default in the regular job transcript ''.xml'' file; some of them can result from using the ''%%PUT,XML%%'' to make a separate dump file.
+''molpro-output'' is understood by several post-processing programs, including [[https://jmol.org|Jmol]], [[https://gitlab.com/molpro/sjef|sjef]] and gmolpro.
+The following gives an example of extracting and manipulating the details of the wavefunction using Python.
+<code - examples/basis-from-xml.inp>
+geometry={o;h,o,r;h,o,r,h,theta}
+r=1 angstrom, theta=104 degree
+basis,cc-pvqz
+rhf
+{casscf;closed,2;expec,delta,o;expec,delta,h}
+put,xml,h2ocasscf.xml
+if (.not.iswindows) then
+system,'./evaluate-charge-density.py',' h2ocasscf.xml',' > basis-from-xml.log'
+endif
+</code>
+<code - examples/evaluate-charge-density.py>
+#!/usr/bin/env python
+import sys
+import math
+import re
+from lxml import etree
+from numpy import *
+import numpy as np
+tree = etree.parse(sys.argv[1])
+document = tree.getroot()
+namespaces={
+    'molpro-output': 'http://www.molpro.net/schema/molpro-output',
+    'xsd': 'http://www.w3.org/1999/XMLSchema',
+    'cml': 'http://www.xml-cml.org/schema',
+    'stm': 'http://www.xml-cml.org/schema',
+    'xhtml': 'http://www.w3.org/1999/xhtml',
+    'xlink': 'http://www.w3.org/1999/xlink'}
+def evaluateBasis(molecule,points): # evaluate basis set at a number of points
+    cartesianAngularQuantumNumbers=array(
+     [[0, 1,0,0, 2,0,0,1,1,0, 3,0,0,2,2,1,0,1,0,1, 4,0,0,3,3,1,0,1,0,2,2,0,2,1,1, 5,4,4,3,3,3,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0, 6,5,5,4,4,4,3,3,3,3,2,2,2,2,2,1,1,1,1,1,1,0,0,0,0,0,0,0],
+      [0, 0,1,0, 0,2,0,1,0,1, 0,3,0,1,0,2,2,0,1,1, 0,4,0,1,0,3,3,0,1,2,0,2,1,2,1, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0,6,5,4,3,2,1,0],
+      [0, 0,0,1, 0,0,2,0,1,1, 0,0,3,0,1,0,1,2,2,1, 0,0,4,0,1,0,1,3,3,0,2,2,1,1,2, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6]])
+    normfac=array([1, 1,1,1, 3,3,3,1,1,1, 15,15,15,3,3,3,3,3,3,1, 105,105,105,15,15,15,15,15,15,9,9,9,1,1,1, 945,105,105,45,15,45,45,9,9,45,105,15,9,15,105,945,105,45,45,105,945])
+    sphtran=[]
+    sphtran.append(array([[1]]))
+    sphtran.append(array([[1,0,0],[0,1,0],[0,0,1]]))
+    sphtran.append(array([[-0.5,-0.5,1,0,0,0],[1,-1,0,0,0,0],[0,0,0,0,1,0],[0,0,0,1,0,0],[0,0,0,0,0,1]]))
+    for answer in molecule.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces):
+        Angstrom=np.float64(answer.text)
+    atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces)
+    basisSets = molecule.xpath('molpro-output:basisSet[@id="ORBITAL"]',namespaces=namespaces)
+    if len(basisSets) != 1:
+        raise Exception('something is wrong: there should be just one orbital basisSet')
+    basisSet = basisSets[0]
+    print('Basis length =',basisSet.get('length'),' angular =',basisSet.get('angular'),' type =',basisSet.get('type'),' groups =',basisSet.get('groups'))
+    basisGroups = basisSet.xpath('molpro-output:basisGroup',namespaces=namespaces)
+    #    print
+        #print(str(len(basisGroups))+' basisGroups:')
+        #for basisGroup in basisGroups:
+        #    print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions'))
+        #print
+   # now walk through basis set and evaluate it at some points
+    basisAtPoints=empty((0,len(points)),dtype=np.float64)
+    nbasis=0
+    iatom=0
+    for atom in atoms:
+        nuclearCoordinates=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom]
+        query = 'molpro-output:association/molpro-output:atoms[@xlink:href[contains(.,"@id=\'' + atom.get('id') + '\'")]]/..'
+        basisGroupAssociation = basisSet.xpath(query,namespaces=namespaces)
+        if len(basisGroupAssociation) != 1:
+            raise Exception('something is wrong: there should be a unique association of an atom with a basis set')
+        bases=basisGroupAssociation[0].xpath('molpro-output:bases',namespaces=namespaces)
+        if len(bases) != 1:
+            raise Exception('something is wrong: there should be a bases node in association')
+        basesString=bases[0].get('{'+namespaces['xlink']+'}href')
+        basesString=basesString[basesString.find('basisGroup['):]
+        basesString=basesString[basesString.find('[')+1:].lstrip()
+        basesString=basesString[:basesString.find(']')].rstrip()
+        basesString=basesString.replace('or','').replace('\n','').replace("'",'')
+        list = basesString.split("@id=");
+        for item in list:
+            item=item.lstrip().rstrip()
+            if item.isalnum() :
+                basisGroup=basisSet.xpath('molpro-output:basisGroup[@id="'+item+'"]',namespaces=namespaces)[0]
+                lquant =int(basisGroup.get('minL'))
+                if lquant > 5:
+                    raise Exception("Sorry, I was too lazy to write this for i basis functions and higher")
+                if lquant != int(basisGroup.get('maxL')):
+                    raise Exception("This program cannot handle multiple-angular momentum sets")
+                #print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions'))
+                alpha=np.float64(re.sub(' +',' ',basisGroup.xpath('molpro-output:basisExponents',namespaces=namespaces)[0].text.replace('\n','').lstrip().rstrip()).split(" "))
+                #first evaluate all the primitives for this shell on the grid
+                ncompc=int(((lquant+2)*(lquant+1))/2) # number of cartesian components
+                primitivesc = empty((ncompc,len(alpha),len(points)))
+                lqbase=int((lquant*(lquant+1)*(lquant+2))/6)
+                for ip in arange(len(points)):
+                    xyz = np.subtract(points[ip],nuclearCoordinates)
+                    r2 = np.dot(xyz,xyz)
+                    for ia in arange(len(alpha)):
+                        alph = alpha[ia]
+                        norm = sqrt(sqrt(2*alph/math.pi)**3*(4*alph)**lquant)
+                        value = norm * math.exp(-alph*r2)
+                        for icomp in range(ncompc):
+                            k=cartesianAngularQuantumNumbers[0,icomp+lqbase]
+                            l=cartesianAngularQuantumNumbers[1,icomp+lqbase]
+                            m=cartesianAngularQuantumNumbers[2,icomp+lqbase]
+                            primitivesc[icomp,ia,ip] = value * xyz[0]**k * xyz[1]**l * xyz[2]**m/math.sqrt(normfac[icomp+lqbase])
+                # transformation to spherical harmonics
+                if molecule.xpath('//molpro-output:orbitals',namespaces=namespaces)[0].get('angular')=='spherical': # Molpro 2012.1 does not produced this, but cartesian only. The spherical code here is not finished, but not presently needed
+                    ncomp=2*lquant+1
+                else:
+                    ncomp=ncompc
+                if ncomp < ncompc and lquant >= len(sphtran):
+                    raise Exception("Spherical functions not yet coded")
+                elif ncomp < ncompc:
+                    primitives = empty((ncomp,len(alpha),len(points)))
+                    print("primitivesc",primitivesc)
+                    print("sphtran[lquant]",sphtran[lquant])
+                    for ip in arange(len(points)):
+                        #print sphtran[lquant]
+                        #print primitivesc[:,:,ip]
+                        #print sphtran[lquant].shape
+                        #print primitivesc[:,:,ip].shape
+                        primitives[:,:,ip] = dot(sphtran[lquant],primitivesc[:,:,ip])
+                    #print "primitives",primitives
+                else:
+                    primitives = primitivesc
+                # next loop over contractions
+                for basisContraction in basisGroup.xpath('molpro-output:basisContraction',namespaces=namespaces):
+                    cc=np.float64(re.sub(' +',' ',basisContraction.text.replace('\n','').lstrip().rstrip()).split(" "))
+                    # loop over angular components in the contraction
+                    basisAtPoints=np.append(basisAtPoints,empty((ncomp,len(points)),dtype=np.float64),axis=0)
+                    for m in range(ncomp):
+                        basisAtPoints[nbasis][:]=dot(cc,primitives[m,:,:])
+                        nbasis+=1 # completed evaluating this basis function
+        iatom+=1
+    #print "basisAtPoints"
+    #print basisAtPoints
+    #print outer(basisAtPoints[:,1],basisAtPoints[:,1])
+    return basisAtPoints # end basis evaluation
+# the main program
+for answer in document.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces):
+    Angstrom=np.float64(answer.text)
+molecules = document.xpath('/*/molpro-output:molecule',namespaces=namespaces)
+print(str(len(molecules))+' molecules:')
+for molecule in molecules:
+    print('Molecule ' + molecule.get('id'))
+    metadata = molecule.xpath('stm:metadataList/stm:metadata',namespaces=namespaces)
+    print(str(len(metadata))+' metadata items:')
+    for md in metadata:
+        print(md.get('name') + ' = ' + md.get('content'))
+    atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces)
+    points=zeros((len(atoms),3),dtype=np.float64)
+    print(str(len(atoms))+' atoms:')
+    ids=empty(len((atoms)),dtype='a4')
+    elementTypes=empty((len(atoms)),dtype='a4')
+    iatom=0
+    for atom in atoms:
+        print(atom.get('id')+' '+atom.get('elementType')+' '+atom.get('x3')+' '+atom.get('y3')+' '+atom.get('z3'))
+        ids[iatom]=atom.get('id')
+        elementTypes[iatom]=atom.get('elementType')
+        points[iatom,:]=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom]
+        iatom+=1
+    basisAtPoints = evaluateBasis(molecule,points) # evaluate the basis set at the nuclei
+    results=zeros(len(points),dtype=np.float64)
+    orbitalSets = molecule.xpath('molpro-output:orbitals',namespaces=namespaces)
+    if len(orbitalSets) != 1:
+        raise Exception('something is wrong: there should be just one orbital set')
+    orbitalSet = orbitalSets[0]
+    for orbital in orbitalSet.xpath('molpro-output:orbital',namespaces=namespaces):
+        occ = np.float64(orbital.get('occupation'))
+        if occ > 0:
+            #        print 'orbital occupation=',occ
+            mos=array(re.sub(' +',' ',orbital.text.lstrip().rstrip().replace('\n','')).split(" "))
+            mopoint=zeros(len(points),dtype=np.float64)
+            for ic in arange(len(basisAtPoints)):
+                for ipoint in arange(len(points)):
+                    mopoint[ipoint]+=np.float64(mos[ic])*basisAtPoints[ic][ipoint]
+            for ipoint in range(len(points)):
+                results[ipoint]+=mopoint[ipoint]**2*occ
+    print("Calculated densities at the nuclei:")
+    print(results)
+    for answer in molecule.xpath('molpro-output:variables/molpro-output:variable[contains(@name,"DELTA")]/molpro-output:value',namespaces=namespaces):
+        name=answer.xpath('..',namespaces=namespaces)[0].get('name').replace('_','')
+        print('From Molpro:',name,'=',answer.text)
+        closeness=10000000000
+        for ipoint in range(len(points)):
+            distance=math.sqrt((results[ipoint]-np.float64(answer.text))**2)
+            if distance < closeness:
+                closeness=distance
+                closest=ipoint
+        print('... nearest to result ',closest,' which differs by ',closeness)
+    print('End of molecule '+molecule.get('id'))
+</code>
+===== Databases =====
+A facility is provided to store and interrogate sets of molecules, together with information about how they are to be combined in balanced chemical equations. This collection of information is referred to as a //database//, and can be generated completely manually, or partially by running appropriate Molpro calculations. Analysis of the database can give a summary of the energy changes associated with each described reaction, and two or more similar databases can be compared reaction by reaction, to give a statistical analysis of the differences between them.
+All of the files associated with the database facility can be found in the directory ''database'' in the main Molpro source or installation tree.
+==== Description and specification of databases ====
+A database is an XML file conforming to the ''molpro-database'' schema, and consists one or more occurrences of each of the following two principal elements.
+  * **molecule** Information about a single molecular species in the ''molpro-output'' XML format. This will usually be the result of ''%%PUT,XML%%'' in a Molpro calculation, but can also be constructed directly from an external data source. The important quantities that are used are the geometry and energy, together with metadata such as the method and basis set, and other quantities such as spin and symmetry that might be useful for constructing a new Molpro job for the molecule.
+  * **reaction** A list of ''species'' specifications that point uniquely to one of the ''molecule'' nodes, together with information on how the species appears stoichimetrically in the reaction, and whether it is a special point such as a transition state. ''species'' specifications can also be given without either of these tags, allowing additional geometries, for example along a reaction coordinate or potential surface cut, to be included.
+Normally, the ''molecule'' nodes will be in separate self-contained files that are then referenced in the main database file through the syntax of [[http://www.w3.org/TR/xinclude|XInclude]]. There are three reasons for this. Firstly, these files can be produced directly by a Molpro calculation, with the rest of the database being constructed by hand. Secondly, they allow the possibility that the molecule files be replaced in the future by, for example, running all the molecule calculations again using a different method; in that case, the rest of the database, i.e. the reaction specifications, does not need to change. This supports the possibility of having several databases that have the same structure – specification of reactions – but different numerical data, and therefore being capable of numerical comparison. Thirdly, several databases can coexist in the same directory, and share some of the same molecule files. An example of this is a supplementary database that consists of a subset of the reactions contained in the main database.
+The following is an example of a complete database of four reactions involving the species  $\rm O, H_2, H_2O, H_2O_2$  and  $\rm CH_2O$ . Note that the association between the ''species'' and the ''%%molpro-output:molecule%%'' nodes is achieved through the use of [[http://www.iupac.org/home/publications/e-resources/inchi.html|InChI]] tags, which ''%%PUT,XML%%'' will produce provided that [[http://openbabel.org|OpenBabel]] is installed on the system. An alternative is through syntax such as
+<code>
+{PUT,XML,file.xml; index,73}
+</code>
+and the use of ''%%<species index="73">%%'' in the database file. Note that sometimes different species have the same InChI, and so the use of ''index'' is necessary to resolve ambiguities.
+<code xml database/sets/examples/reactions/reactions.xml>
+<?xml version="1.0"?>
+<database xmlns="http://www.molpro.net/schema/molpro-database"
+	  xmlns:xi="http://www.w3.org/2001/XInclude">
+     <xi:include href="h2o.xml" />
+     <xi:include href="h2o2.xml"/>
+     <xi:include href="h2.xml"/>
+     <xi:include href="o2.xml"/>
+     <xi:include href="co.xml"/>
+     <xi:include href="h2co.xml"/>
+     <xi:include href="h2cots.xml"/>
+     <xi:include href="o.xml"/>
+     <reaction title="insertion of O into H2">
+       <species InChI="InChI=1S/H2O/h1H2" count="1"/>
+       <species InChI="InChI=1S/O" count="-1"/>
+       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
+     </reaction>
+     <reaction title="H2O formation">
+       <species InChI="InChI=1S/H2O/h1H2" count="1"/>
+       <species InChI="InChI=1S/O2/c1-2" count="-0.5"/>
+       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
+     </reaction>
+     <reaction title="H2O2 formation">
+       <species InChI="InChI=1S/H2O2/c1-2/h1-2H" count="1"/>
+       <species InChI="InChI=1S/O2/c1-2" count="-1"/>
+       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
+     </reaction>
+     <reaction title="Formaldehyde dissociation">
+       <species InChI="InChI=1S/CH2O/c1-2/h1H2" count="-1"/>
+       <species InChI="InChI=1S/CO/c1-2" count="1"/>
+       <species InChI="InChI=1S/2H2/h2*1H" count="1"/>
+       <species InChI="InChI=1S/CHO.H/c1-2;/h1H;"
+		 transitionState="true"/>
+     </reaction>
+</database>
+</code>
+For full specification of the possible structure of a database, see the schema file
+==== Interrogation and manipulation of databases ====
+The directory ''%%database/utilities%%'' contains several Python scripts that manipulate databases. For convenience, they can be run through the script
+''%%molpro –database%%'' //script-name// //arguments …//
+so long as Python (version 3 preferred) is installed on the system. You need the [[http://lxml.de|lxml]] and [[http://docs.python-requests.org|requests]] package included in your Python installation:
+<code>
+pip install lxml requests
+</code>
+(or ''pip3'' if you are using Python 3).
+The script ''validate'' checks whether a database conforms to the schema, for example
+<code>
+cd Molpro  # assuming below that we are in Molpro source tree, but works from anywhere
+bin/molpro --database validate \
+ database/sets/examples/reactions/reactions.xml
+</code>
+=== Computation of new database data ===
+The script ''clone'' takes an existing database, for which the file name should be provided as an argument, and generates a set of Molpro jobs that will run the same method on each of the molecules, with the end result that a new database is created. If the database master file is the only one in the directory that declares itself to belong to the ''molpro-database'' schema, then you can just give the directory name as the argument to this and other scripts instead. In addition, if the database master file has the suffix ''.xml'', the suffix does not need to be specified. For the above example, this could be
+<code>
+cd Molpro  # assuming below that we are in Molpro source tree, but works from anywhere
+bin/molpro --database clone \
+ database/sets/examples/reactions
+</code>
+This will create a new directory ''reactions.d'' with the following contents.
+<code>
+original/          reactions.xml      runall/
+procedures.molpro  run/
+reactions.d/original:
+co.xml      h2co.xml    h2o.xml     o.xml
+h2.xml      h2cots.xml  h2o2.xml    o2.xml
+reactions.d/run:
+co.molpro      h2co.molpro    h2o.molpro     o.molpro
+h2.molpro      h2cots.molpro  h2o2.molpro    o2.molpro
+reactions.d/runall:
+reactions.molpro
+</code>
+The file ''procedures.molpro'' contains a procedure that will be run on every molecule, and it should be edited to use the desired methods. Then the calculations can be run, either via each of the individual Molpro input files in ''%%run/%%'', or the single input file ''reactions.molpro'' in the directory ''runall''. Once these jobs have completed, then the directory contains a complete database with the original reaction scheme but new data.
+=== Analysis of databases ===
+<code>
+cd Molpro
+bin/molpro --database analyse \
+ database/sets/examples/reactions/reactions.xml
+</code>
+will analyse the database, and report the energy change for each described reaction. If two or more databases are given as arguments, the analysis will be done on each, and also on the difference between the first database and the second and any subsequent, including a statistical summary. For the example given above, one might say
+<code>
+cd Molpro/reactions.d
+../bin/molpro --database analyse original .
+</code>
+''analyse'' has a number of options that are described by running
+<code>
+molpro --database analyse --help
+</code>
+==== Library of databases ====
+The directory ''%%database/sets%%'' contains several standard databases. Within each one is a description of its origin, contents and purpose. The scripts described above can take these databases as inputs, for example ''%%database/sets/examples/reactions/reactions.xml%%''; as a shortcut, one could simply instead use ''%%examples/reactions%%'' which will find the system database irrespective of the current working directory.