Of the form prefix:suffix where prefix and suffix
are purely alphanumeric (with _ and -) and prefix
is optional. This is similar to XML IDs (and we promote
this as good practice for atomIDs. Other punctuation and
whitespace is forbidden, so IDs from (say) PDB files are
not satisfactory.
The prefix is intended to form a pseudo-namespace so that
atom IDs in different molecules may have identical suffixes.
It is also useful if the prefix is the ID for the molecule
(though this clearly has its limitation). Atom IDs should not
be typed as XML IDs since they may not validate.
The atomRefs
cannot be schema- or schematron-validated. Instances of this type will
be used in array-style representation of bonds and atomParitys.
It can also be used for arrays of atomIDTypes such as in complex stereochemistry,
geometrical definitions, atom groupings, etc.
The references cannot (yet)
cannot be schema- or schematron-validated. Instances of this type will
be used in array-style representation of electron counts, etc.
It can also be used for arrays of bondIDTypes such as in complex stereochemistry,
geometrical definitions, bond groupings, etc.
A reference to a bond may be made by atoms (e.g. for multicentre or pi-bonds), electrons (for annotating reactions or describing electronic properties) or possibly other bonds (no examples yet). The semantics are relatively flexible.
Defined by 6 real numbers
(x1 y1 z1 x2 y2 z2). By default these are Cartesian coordinates (with units
specified elsewhere - responsibility of schema creator.) If there is a means
of specifying oblique axes (e.g. crystallographic cell) the box may be a
parallelipiped. The components are grouped in threes ans separated by a semicolon
to avoid problems of guessing the convention.
An x/y coordinate pair consisting of
two real numbers, separated by whitespace or a comma.
In arrays and matrices, it may be useful to set a separate delimiter
An x/y/z coordinate triple consisting of three real
numbers, separated by whitespace or commas. In arrays and matrices, it may be
useful to set a separate delimiter.
An array of coordinateComponents for a single coordinate.
An array of coordinateComponents for a single coordinate
where these all refer to an X-coordinate (NOT x,y,z).Instances of this type will be
used in array-style representation of 2-D or 3-D coordinates. Currently no machine
validation. Currently not used in STMML, but re-used by CML (see example).
value comes from list: {'xsd:string'|'xsd:boolean'|'xsd:float'|'xsd:double'|'xsd:decimal'|'xsd:duration'|'xsd:dateTime'|'xsd:time'|'xsd:date'|'xsd:gYearMonth'|'xsd:gYear'|'xsd:gMonthDay'|'xsd:gDay'|'xsd:gMonth'|'xsd:hexBinary'|'xsd:base64Binary'|'xsd:anyURI'|'xsd:QName'|'xsd:NOTATION'|'xsd:normalizedString'|'xsd:token'|'xsd:language'|'xsd:IDREFS'|'xsd:ENTITIES'|'xsd:NMTOKEN'|'xsd:NMTOKENS'|'xsd:Name'|'xsd:NCName'|'xsd:ID'|'xsd:IDREF'|'xsd:ENTITY'|'xsd:integer'|'xsd:nonPositiveInteger'|'xsd:negativeInteger'|'xsd:long'|'xsd:int'|'xsd:short'|'xsd:byte'|'xsd:nonNegativeInteger'|'xsd:unsignedLong'|'xsd:unsignedInt'|'xsd:unsignedShort'|'xsd:unsignedByte'|'xsd:positiveInteger'|'dataTypeType'|'namespaceRefType'|'unitsType'}
dataTypeType represents an enumeration of allowed dataTypes
(at present identical with those in XML-Schemas (Part2- datatypes).
This means that implementers should be able to use standard XMLSchema-based
tools for validation without major implementation problems.
It will often be used an an attribute on
scalar,
array or
matrix
elements.
Note: the attribute xsi:type might be used to enforce the type-checking but I haven't worked this through yet.
A single non-whitespace character to separate components in arrays.
Some STMML elements (such as array) have
content representing concatenated values. The default separator is
whitespace (which can be normalised) and this should be used whenever
possible. However in some cases the values are empty, or contain whitespace or other
problematic punctuation, and a delimiter is required.
Note that the content string MUST start and end with the delimiter so
there is no ambiguity as to what the components are. Only printable
characters from the ASCII character set should be used, and character
entities should be avoided.
When delimiters are used to separate precise whitespace this should always
consist of spaces and not the other allowed whitespace characters
(newline, tabs, etc.). If the latter are important it is probably best to redesign
the application.
At present there is a controlled pattern of characters selected so as not to collide with common usage in XML document
The values in the array are
"A", "B12", "" (empty string) and "D and E"
note the spaces
value comes from list: {'mass'|'length'|'time'|'current'|'amount'|'luminosity'|'temperature'|'dimensionless'|'angle'}
Documentation
Allowed values for dimension Types in quantities.
These are the 7 types prescribed by the SI system, together
with the "dimensionless" type. We intend to be somewhat uncoventional
and explore enhanced values of "dimensionless", such as "angle".
This may be heretical, but we find the present system impossible to implement
in many cases.
Used for constructing entries in a dictionary of units
Specifies whether the rows or columns of the (square) matrix
correspond to the eigenvectors. For example, in molecular orbitals
the vectors are normally represented as columns, and each column
would correspond to a different eigenvalue
value comes from list: {'Ac'|'Al'|'Ag'|'Am'|'Ar'|'As'|'At'|'Au'|'B'|'Ba'|'Bh'|'Bi'|'Be'|'Bk'|'Br'|'C'|'Ca'|'Cd'|'Ce'|'Cf'|'Cl'|'Cm'|'Co'|'Cr'|'Cs'|'Cu'|'Db'|'Dy'|'Er'|'Es'|'Eu'|'F'|'Fe'|'Fm'|'Fr'|'Ga'|'Gd'|'Ge'|'H'|'He'|'Hf'|'Hg'|'Ho'|'Hs'|'I'|'In'|'Ir'|'K'|'Kr'|'La'|'Li'|'Lr'|'Lu'|'Md'|'Mg'|'Mn'|'Mo'|'Mt'|'N'|'Na'|'Nb'|'Nd'|'Ne'|'Ni'|'No'|'Np'|'O'|'Os'|'P'|'Pa'|'Pb'|'Pd'|'Pm'|'Po'|'Pr'|'Pt'|'Pu'|'Ra'|'Rb'|'Re'|'Rf'|'Rh'|'Rn'|'Ru'|'S'|'Sb'|'Sc'|'Se'|'Sg'|'Si'|'Sm'|'Sn'|'Sr'|'Ta'|'Tb'|'Tc'|'Te'|'Th'|'Ti'|'Tl'|'Tm'|'U'|'Uun'|'Uuu'|'Uub'|'Uut'|'Uuq'|'Uup'|'Uuh'|'Uus'|'Uuo'|'V'|'W'|'Xe'|'Y'|'Yb'|'Zn'|'Zr'|'Dummy'|'Du'|'R'}
Locally defined type:
Base XSD Type: string
pattern = [A-Za-z]+:[A-Za-z][A-Za-z0-9\-]+
Documentation
Allowed elementType values.
The periodic table (up to
element number 118. In addition the following strings are allowed:
Du. ("dummy") This does not correspond to a "real" atom and can
support a point in space or within a chemical graph.
R. ("R-group") This indicates that an atom or group of atoms could be attached at this point.
An observed or calculated estimate of the error in the value of a numeric quantity. It should be ignored for dataTypes such as URL, date or string. The statistical basis of the errorValueType is not defined - it could be a range, an estimated standard deviation, an observed standard error, etc. This information can be added through _errorBasisType_.
An estimate of the error in the value of a quantity.
An observed or calculated estimate of the error in the value of a numeric quantity. It should be ignored for dataTypes such as URL, date or string. The statistical basis of the errorValueType is not defined - it could be a range, an estimated standard deviation, an observed standard error, etc. This information can be added through _errorBasisType_.
This MUST adhere to a whitespaced syntax so that it is trivially
machine-parsable. Each element is followed by its count (which may be decimal),
and the string is optionally ended by a formal charge (of form d or -d, i.e. no '+')
NO brackets or other nesting is allowed.
A polymeric chain may be described by liniing the head of one repeat
unit to the tail or head of another. The head attribute indicates the atom
id (normally on an atom of elementType="R") which acts as the head
The total number of hydrogen atoms bonded to an atom or contained in a molecule, whether explicitly included as atoms or not. It is an error to have hydrogen count less than the explicit hydrogen count. There is no default value and no assumptions about hydrogen Count can be made if it is not given. If hydrogenCount is given on every atom, then the values can be summed to give the total hydrogenCount for the (sub)molecule. Because of this hydrogenCount should not be used where hydrogen atoms bridge 2 or more atoms.
The total number of hydrogen atoms bonded to an object.
The total number of hydrogen atoms bonded to an atom or contained in a molecule, whether explicitly included as atoms or not. It is an error to have hydrogen count less than the explicit hydrogen count. There is no default value and no assumptions about hydrogen Count can be made if it is not given. If hydrogenCount is given on every atom, then the values can be summed to give the total hydrogenCount for the (sub)molecule. Because of this hydrogenCount should not be used where hydrogen atoms bridge 2 or more atoms.
This is not formally of type ID (an XML NAME which must start with a letter and contain only letters, digits and .-_:). It is recommended that IDs start with a letter, and contain no punctuation or whitespace. The function in XSLT will generate semantically void unique IDs.
It is difficult to ensure uniqueness when documents are merged. We suggest
namespacing IDs, perhaps using the containing elements as the base.
Thus mol3:a1 could be a useful unique ID.
However this is still experimental.
value comes from list: {'merge'|'replace'|'delete'}
Documentation
Inheritance mechanism.
A reference to an existing element can be used to supplement values such as coordinates. The inheritance attribute determines whether the values are supplemented, overwritten or deleted. In the example:
<molecule id="m1" view="initial">
<atomArray>
<atom id="a1" x3="0.1"/>
</atomArray>
</molecule>
<!−- this adds more information -−>
<molecule ref="m1" view="initial" inherit="supplement">
<atomArray>
<atom id="a1" hydrogenCount="1"/>
</atomArray>
</molecule>
<!−- this will overwrite the previous values -−>
<molecule ref="m1" inherit="overwrite" view="final"
id="m2">
<atomArray>
<atom id="a1" x3="0.1"/>
</atomArray>
</molecule>
<!−- this will delete the previous values -−>
<molecule ref="m1" inherit="delete" view="restart">
<atomArray>
<atom id="a1" hydrogenCount=""/>
</atomArray>
</molecule>
The first molecule/@ref adds complementary information, the second
changes the values. Software is allowed to generate two independent copies of the molecule and reference them by different IDs (m1 and m2).
This mechanism is necessary to manage the implied inheritance of partial information during minimisations and dynamics. It requires careful software implementation.
In core CML this represents a single number; either the
combined proton/neutron count or a more accurate estimate of the
nuclear mass. This is admittedly fuzzy, and requires a more complex
object (which can manage conventions, lists of isotopic masses, etc.)
See isotope.
The default is "natural abundance" - whatever that can be interpreted
as.
Delta values (i.e. deviations from the most abundant istopic mass)
are never allowed.
Defined by 6 real numbers, conventionally an arbitrary
point on the line and a vector3. There is no significance to the point
(i.e. it is not the "end of the line") and there are an infinite number of
ways of representing the line. DANGER. Line3 now uses the point3 and vector3 attributes
and the line3Type may be OBSOLETED.
value comes from list: {'rectangular'|'square'|'squareSymmetric'|'squareSymmetricLT'|'squareSymmetricUT'|'squareAntisymmetric'|'squareAntisymmetricLT'|'squareAntisymmetricUT'|'diagonal'|'upperTriangular'|'upperTriangularUT'|'lowerTriangular'|'lowerTriangularLT'|'unit'|'unitary'|'rowEigenvectors'|'rotation22'|'rotationTranslation32'|'homogeneous33'|'rotation33'|'rotationTranslation43'|'homogeneous44'}
Many are square matrices. By default all elements must be included. For symmetric, antisymmetric and diagonal matrices some compression is possible by not reporting the identical or forced zero elements. These have their own subtypes, usually with UT or LT appended. Use these with caution as there is chance of confusion and you cannot rely on standard software to read these.
The matrix type fixes the order and semantics of the elements in the XML element but does not mandate any local syntax. Thus an application may insert newline characters after each row or use a <row> element.
The maximum INCLUSIVE value of a sortable quantity such as
numeric, date or string. It should be ignored for dataTypes such as URL.
The use of min and
max attributes can be used to give a range for the quantity.
The statistical basis of this range is not defined. The value of max
is usually an observed
quantity (or calculated from observations). To restrict a value, the
maxExclusive type in a dictionary should be used.
The type of the maximum is the same as the quantity to which it refers - numeric,
date and string are currently allowed
value comes from list: {'dc:coverage'|'dc:description'|'dc:identifier'|'dc:format'|'dc:relation'|'dc:rights'|'dc:subject'|'dc:title'|'dc:type'|'dc:contributor'|'dc:creator'|'dc:publisher'|'dc:source'|'dc:language'|'dc:date'|'cmlm:safety'|'cmlm:insilico'|'cmlm:structure'|'cmlm:reaction'|'cmlm:identifier'|'other'}
Metadata consists of name-value pairs (value is in the "content" attribute). The names are from a semi-restricted vocabulary, mainly Dublin Core. The content is unrestricted. The order of metadata has no implied semantics at present. Users can create their own metadata names using the namespaced prefix syntax (e.g. foo:institution). Ideally these names should be defined in an STMML dictionary.
2003-03-05: Added UNION to manage non-controlled name.
The minimum INCLUSIVE value of a sortable quantity such as
numeric, date or string. It should be ignored for dataTypes such as URL.
The use of min and
min attributes can be used to give a range for the quantity.
The statistical basis of this range is not defined. The value of min
is usually an observed
quantity (or calculated from observations). To restrict a value, the
minExclusive type in a dictionary should be used.
The type of the minimum is the same as the quantity to which it refers - numeric,
date and string are currently allowed
Of the form prefix:suffix where prefix and suffix
are purely alphanumeric (with _ and -) and prefix
is optional. This is similar to XML IDs (and we promote
this as good practice for moleculeIDs. Other punctuation and
whitespace is forbidden, so IDs from (say) PDB files are
not satisfactory.
The prefix is intended to form a pseudo-namespace so that
molecule IDs in different molecules may have identical suffixes.
It is also useful if the prefix is the ID for the molecule
(though this clearly has its limitation). molecule IDs should not
be typed as XML IDs since they may not validate.
Typical applications are the annotation of
peaks in chromatograms and mapping reactions. The context of the
id resolution is the childOrSibling concept.
A string referencing a dictionary, units, convention or other metadata.
The purpose is to allow authors to extend the vocabulary through
their own namespaces without altering the schema.
The prefix is mandatory. This convention is only used within
CML and related languages; it is NOT a generic URI.
Note that we also provide positiveNumber to avoid inclusive zero. The maximum number is 1.0E+999 since 'unbounded' is more difficult to implement. This is greater than Eddington's estimate of the number of particles in the universe so it should work for most people.
This is purely conventional and used
for bond/electron counting. There is no default value.
The emptyString attribute can be used to indicate a bond of
unknown or unspecified type. The interpretation of this is outside
the scope of CML-based algorithms. It may be accompanied by a convention
attribute on the bond which links to a dictionary.
Example: <bond convention="ccdc:9" atomRefs2="a1 a2"/> could
represent a delocalised bond in the CCDC convention.
Defined by 4 real numbers, conventionally a vector3
normal to the plane and a signed scalar representing the distance to the origin.
The vector must not be of zero length (and need not be normalized.
The first three numbers are the vector, followed by the distance
By default the reactions in a reactionStepList are assumed to take place in sequence (e.g. one or more products of reaction n are used in reaction n+1 or later. However there are cases where it is known that reactions take place in parallel (e.g. if there is no overlap of molecular identities). Alternatively there are points at which there are two or more competing reactions which may depend on conditions or concentrations. A small semi-controlled vocabulary is suggested.
The semantic of these are not fully explored, but we suggest that consecutive and simultaneous should be the first to be supported
This is provided for machine-understanding of the topology or logic of the reaction steps and components (i.e. not for a general classification for which label is more appropriate.)
Semantics are semi-controlled. Some terms are appropriate to multistep reactions, and can be used with or without explicit steps.
A reference to an existing element in the document.
The target of the ref attribute must exist. The test for validity will normally
occur in the element's _appinfo_. Any DOM Node created from this element will
normally be a reference to another Node, so that if the target node is modified
a the dereferenced content is modified. At present there are no deep copy
semantics hardcoded into the schema.
The semantic of reference are normally identical to
an idType (e.g. "a123b"). Howevere there are some cases where compound references
are required, such as "a123b:pq456". It is likely that this will be superseded at
by RDF or Xpointer, but as long as we have non-uniqueIds this is a problem
value comes from list: {'parent'|'partitiveParent'|'child'|'partitiveChild'|'related'|'synonym'|'quasi-synonym'|'antonym'|'homonym'|'see'|'seeAlso'|'abbreviation'|'acronym'}
The attribute contains an index, its start value
(normally 1) and its end value as in "i 3 10" which would make 8 repeat
of the object. In selected attribute values the string _i_ acts as a macro and
would be replaced by the value of i. EXPERIMENTAL. It can also have variables
as the values.
By default the reactions in a reactionStepList are assumed to take place in sequence (e.g. one or more products of reaction n are used in reaction n+1 or later. However there are cases where it is known that reactions take place in parallel (e.g. if there is no overlap of molecular identities). Alternatively there are points at which there are two or more competing reactions which may depend on conditions or concentrations. A small semi-controlled vocabulary is suggested.
The state(s) of matter appropriate to a substance or property. It follows a partially controlled vocabulary. It can be extended through namespace codes to dictionaries.
This is purely conventional. There is no default value.
The emptyString attribute can be used to indicate a bond of
unknown or unspecified type. The interpretation of this is outside
the scope of CML-based algorithms. It may be accompanied by a convention
attribute which links to a dictionary.
An array of strings, separated by whitespace. If the strings have embedded whitespace or may be empty (zero-length), a non-whitespace single-character delimiter must be used. At present no machine validation
A polymeric chain may be described by liniing the tail of one repeat
unit to the head or tail of another. The tail attribute indicates the atom
id (normally on an atom of elementType="R") which acts as the tail
These will be linked to dictionaries of
units with conversion information, using namespaced references
(e.g. si:m). Distinguish carefully from _unitType_
which is an element describing a type of a unit in a
_unitList_.
(Distinguish from a chemical element as in elementTypeType).
Currently used for assigning XMLElement types to references (e.g. to='a1' toType='atom').
Semantics are not controlled and in principle elements outside the CML tagSet
could be used. Implementers cannot assume that namespace prefixes can be resolved
and default usage is probably the local name.
An action which might occur in scientific data or narrative.
An action which might occur in scientific data or narrative. The definition is deliberately vague, intending to collect examples of possible usage. Thus an action could be addition of materials, measurement, application of heat or radiation. The content model is unrestricted. _action_ iself is normally a child of _actionList_.
The start, end and duration attributes should be interpreted as
XSD dateTimes and XSD durations. This allows precise recording of time of day, etc, or duration after start of actionList. A convention="xsd" attribute should be used to enforce XSD.
a numerical value, with a units attribute linked to a dictionary.
a human-readable string (unlikely to be machine processable)
startCondition and endCondition values are not constrained, which allows XSL-like test attribute values. The semantics of the conditions are yet to be defined and at present are simply human readable.
The order of the action elements in the document may, but will not always, define
the order that they actually occur in.
A delay can be shown by an action with no content. Repeated actions or
actionLists are indicated through the count attribute.
At present a child of _entry_ which represents an alternative string that refers to the concept. There is a partial controlled vocabulary in _alternativeType_ with values such as :
A documentation container similar to annotation in XML Schema.
A documentation container similar to annotation in XML Schema. At present this is experimental and designed to be used for dictionaries, units, etc. One approach is to convert these into XML Schemas when the documentation and appinfo children will emerge in their correct position in the derived schema.
It is possible that this may develop as a useful tool for annotating components
of complex objects such as molecules.
A container for machine processable documentation for an entry. This is likely to be platform and/or language specific. It is possible that XSLT, RDF or XBL will emerge as generic languages. See _annotation_ and _documentation_ for further information.
An example in XSLT where an element _foo_ calls a bespoke template
Arguments can be typed and have explicit
or free values. They can also carry out substitutions in the parent element
and its children (substitute, still experiemental) and delete itself after
this.
2006-02-14: PMR. Added atomType as child
2006-05-21: PMR. Added substitute and delete
attributes
A homogenous 1 dimensional array of similar object.
These can be encoded as strings (i.e. XSD-like datatypes) and are concatenated as string content. The size of the array should always be >= 1. The default delimiter is whitespace. The _normalize-space()_ function of XSLT could be used to normalize all whitespace to single spaces and this should not affect the value of the array elements. To extract the elements __java.lang.StringTokenizer__ could be used. If the elements themselves contain whitespace then a different delimiter must be used and is identified through the delimiter attribute. This method is mandatory if it is required to represent empty strings. If a delimiter is used it MUST start and end the array - leading and trailing whitespace is ignored. Thus size+1 occurrences of the delimiter character are required. If non-normalized whitespace is to be encoded (e.g. newlines, tabs, etc) you are recommended to translate it character-wise to XML character entities.
Note that normal Schema validation tools cannot validate the elements
of array (they are defined as string) However if the string is
split, a temporary schema
can be constructed from the type and used for validation. Also the type
can be contained in a dictionary and software could decide to retrieve this
and use it for validation.
When the elements of the array are not simple scalars
(e.g. scalars with a value and an error, the
scalars should be used as the elements. Although this is
verbose, it is simple to understand. If there is a demand for
more compact representations, it will be possible to define the
syntax in a later version.
the size attribute is not mandatory but provides a useful validity
check):
A major use of arrayList is to contain data within rectangular tables. However there is no
absolute requirement and the table can have any shape. The <tt> shape</tt> attribute hould be used
to assert rectangularity.
Atoms can only be chosen from the periodic table and superatoms such as "Phe" or "Tyr" are not allowed. The elementType of an atom is identified by that attribute. There are two additional elementTypes, "Du" (for an object which does not have an identifiable nucleus but is useful in calculations and definitions (such as a centroid); and "R" which describes a generic fragment. Although atoms have an elementType, they do not, by default, support arbitrary atomTypes for which the <atomType> element should be used.
2006-01-12: PMR. Added vector3 child to support
accelerations, velocities, dipole, etc.
A child of _molecule_ and contains _atom_ information. There are two strategies:
Create individual _atom_ elements under _atomArray_ (in any order). This gives the greatest flexibility but is the most verbose.
Create *Array attributes (e.g. of _elementTypeArrayType_ under _atomArray_. This requires all arrays to be of identical lengths with explicit values for all atoms in every array. This is NOT suitable for complexType atom children such as _atomParity_. It also cannot be checked as easily by schema- and schematron validation. The _atomIDArray_ attribute is mandatory. It is allowed (though not yet recommended) to add _*Array_ children such as _floatArray_
The attributes are directly related to the scalar attributes under _atom_ which should be consulted for more info.
Example - these are exactly equivalent representations
An atomic atomicBasisFunction which can be linked to atoms,
eigenvalues/vectors etc. Normally contained within _basisSet_
Normally these are atom-centered functions, but they can also serve as
"ghost" functions which are centered on points. These can be dummy atoms so
that the atomRef mechanism can still be used.
This information is required to interpret the eignevector components
and map them onto the atom list. However this mapping is normally implicit
in the program and so it may be necessary to generate basisSet
information for some programs before XML technology can be automatically used
to link the components of the CCML document.
It follows the convention of the MIF format,
and uses 4 distinct atoms to define the chirality. These can be any
atoms (though they are normally bonded to the current atom). There is
no default order and the order is defined by the atoms in the atomRefs4
attribute. If there are only 3 ligands, the current atom should be
included in the 4 atomRefs.
The value of the parity is a signed number. (It can only be
zero if two or more atoms are coincident or the configuration is
planar). The sign is the sign of the chiral volume created by the
four atoms (a1, a2, a3, a4):
An atomSet consists of a number of unique references to atoms throught their ids. atomSets need not be related to molecules (which are generally created by aggregation of explicit atoms). Two or more atomSets may reference the same atom, and atomSets may be empty.
atomSets have many potential uses such as:
identifying functional groups
results of substructure matching
identifying atoms with particular roles in a calculation
The atomSet may be referenced from elsewhere in the document and you are encouraged to use locally unique id attributes on atomSets.
atomTypes are used in a wide variety of ways in computational chemistry.
They are normally labels added to existing atoms (or dummy atoms)
in the molecule and have a number of defined properties.
These properties are usually in addition to those deducible from the
elementType of the atom. AtomTypes usually depend on the chemical or
geometrical environment of the atom and are frequently assigned by
algorithms with chemical perception. However they are often frequently
set or "tweaked" by humans initiating a program run.
AtomTypes on an atom have no formal relation to its elementType,
which only describe the number of protons in the nucleus. It is not unknown
(though potentially misleading) to use an "incompatible" atomType to
alter the computational properties of an atom (e.g. pretend this K+
is a Ca++ to increase its effective charge). atomTypes
will also be required to describe pseudoAtoms such as "halogen"
(generic) or "methyl group" (unified atom). Atoms in computations
can therefore have an atomType child with a "ref"
attribute.
An atomType contains numeric or other quantities associated with
it (charges, masses, use in force-fields, etc.) and also description
of any perception algorithms (chemical and/or geometrical) which could
be used to compute or constrain it. This is still experimental.
atomTypes are referred to by their mandatory name
attribute. An atom refers to one or more atomTypes through
atomType/@ref children
_bond_ is a child of _bondArray_ and contains bond information. Bond must refer to at least two atoms (normally using _atomRefs2_) but may also refer to more for multicentre bonds. Bond is often EMPTY but may contain _electron_, _length_ or _bondStereo_ elements.
_bondArray_ is a child of _molecule_ and contains _bond_ information. There are two strategies:
Create individual bond elements under bondArray
(in any order). This gives the greatest flexibility but is the most verbose.
Create *Array attributes (e.g. of orderArrayType under
bondArray. This requires all arrays to be of identical lengths with explicit values for all bonds in every array. This is NOT suitable for complexType bond children such as _bondStereo_ nor can IDs be added to bonds.. It also cannot be checked as easily by schema- and schematron validation. The _atomRef1Array_ and _atomRef2Array_ attributes are then mandatory. It is allowed (though not yet recommended) to add _*Array_ children such as _floatArray_
The attributes are directly related to the scalar attributes under _atom_ which should be consulted for more info.
Example - these are exactly equivalent representations
An bondSet consists of a number of unique references to bonds throught their ids. bondSets need not be related to molecules (which are generally created by aggregation of explicit bonds). Two or more bondSets may reference the same bond, and bondSets may be empty.
bondSets have many potential uses such as:
identifying functional groups
results of substructure matching
identifying bonds with particular roles in a calculation
The bondSet may be referenced from elsewhere in the document and you are encouraged to use locally unique id attributes on bondSets.
A container supporting cis trans wedge hatch and other stereochemistry.
An explict list of atomRefs must be given, or it must be a child of bond. There are no implicit conventions such as E/Z. This will be extended to other types of stereochemistry.
At present the following are supported:
No atomRefs attribute. Deprecated, but probably unavoidable.
This must be a child of bond where it picks up the two atomRefs
in the atomRefs2 attribute. Possible values are C/T (which only makes sense
if there is exactly one ligand at each end of the bond) and W/H. The latter
should be raplaced by atomParity wherever possible. Note that W/H makes
no sense without 2D atom coordinates.
atomRefs4 attribute. The 4 atoms represent a cis or trans configuration.
This may or may not be a child of bond; if so the second and third atomRefs
should be identical with the two atomRefs in the bond. This structure can be used
to guide processors in processing stereochemistry and is recommended, since there is
general agreement on the semantics. The semantics of bondStereo not related to
bonds is less clear (e.g. cumulenes, substituted ring nuclei) etc.It is
currently an error to have more than one bondStereo referring to the same ordered
4-atom list
atomRefs attribute. There are other stereochemical conventions such as cis/trans
for metal complexes which require a variable number of reference atoms. This allows
users to create their own - at present we do not see CML creating exhaustive tables.
For example cis/trans square-planar complexes might require 4 (or 5) atoms for their
definition, octahedral 6 or 7, etc. In principle this is very powerful and could
supplement or replace the use of cis-, mer-, etc.
the atomRefs and atomRefs4 attributes cannot be used
simultaneously.
Bond types are used to describe the behaviour
of bonds in forcefields, functional groups, reactions and many other
domains. They are not as well formalised as atomTypes and we provide
less semantic support. BondTypes are referred to by their mandatory
_name_ attribute.
Often the root of the CML (sub)document.
Has no explicit function but can serve to hold the dictionary and
namespace and version information, and is a useful tag to alert
CML processors
and search/XMLQuery tools that there is chemistry in the document.
Can contain any content, but usually a list of molecules and other
CML components. The fileId attribute can be used to preserve the origin
of the information, though metadat should also be used. Can be nested.
A container for one or more experimental conditions.
This can contain several conditions. These include
(but are not limited to) intensive physical properties (temperature, pressure, etc.),
apparatus (test-tube, rotary evaporator, etc.).
Actions can be represented elsewhere by actionList and solvents or other
substances by substanceList.
Required if fractional coordinates are provided for
a molecule. Originally there were precisely SIX child scalars to represent
the cell lengths and angles in that order. There are no default values; the
spacegroup is also included. This is now deprecated and replaced by cellParameter
The definition should be a short nounal phrase defining the subject of the entry. Definitions should not include commentary, implementations, equations or formulae (unless the subject is one of these) or examples.
The definition can be in any markup language, but normally XHTML will be used,
perhaps with links to other XML namespaces such as CML for chemistry.
This can occur in objects which require textual comment such as entry.
Entries should have at least one separate definitions.
description is then used for most of the other information, including
examples. The class attribute has an uncontrolled vocabulary and
can be used to clarify the purposes of the description
elements.
A dictionary is a container for _entry_ elements.
Dictionaries can also contain unit-related information.
The dictRef attribute on a dictionary element sets a
namespace-like prefix allowing the dictionary to be referenced
from within the document. In general dictionaries are referenced
from an element using the __dictRef__ attribute.
2005-12-15. PMR. added namespace
and dictionaryPrefix.
This will be primarily used within the definition of units.
Two dimensions are of the same type if their 'name' attributes are (case-sensitive)
identical. Dimensions of the same typecan be algebraically combined using the 'power' attributes.
Normally dimensions will be aggregated and cancelled algebraically, but the 'preserve'
attribute can be used to prevent this. Thus a velocity gradient over length can be
defined as:
A container similar to documentation in XML Schema. This is NOT part of the textual content of an entry but is designed to support the transformation of dictionary entrys into schemas for validation. This is experimental and should only be used for dictionaries, units, etc. One approach is to convert these into XML Schemas when the documentation and appinfo children will emerge in their correct position in the derived schema.
Do NOT confuse documentation with the description or the definition which are part of the content
of the dictionary
If will probably only be used when there is significant appinfo
in the entry or where the entry defines an XSD-like datatype of an element in the document.
Since there is very little use of electrons in current chemical information this is a fluid concept. I expect it to be used for electron counting, input and output of theochem operations, descriptions of orbitals, spin states, oxidation states, etc. Electrons can be associated with atoms, bonds and combinations of these. At present there is no hardcoded semantics. However, _atomRef_ and similar attributes can be used to associate electrons with atoms or bond.
The original design for validation with attribute-based constraints is ponderous and fragile. In future constraints will be added through appinfo in annotation. We shall develop this further in the near future.
An enumeration of string values. Used where a dictionary entry constrains the possible values in a document instance. The dataTypes (if any) must all be identical and are defined by the dataType of the containing element.
It is
defined by atomArrays each with a list of elementTypes and their
counts (or default=1). All other information in the atomArray
is ignored. formula are nestable so that aggregates (e.g. hydrates,
salts, etc.) can be described. CML does not require that formula information
is consistent with (say) crystallographic information; this allows for
experimental variance.
An alternative briefer representation is also available through the
concise. This must include whitespace round all elements and
their counts, which must be explicit.
2005-10-16. The semantics are now the following. A formula must have one or both:
A concise attribute
A single atomArray child, using array format.
it must also have a formalCharge attribute if atomArray is used and the charge is non-zero.
The concise, formalCharge and atomArrary information must always be consistent and software should
throw an error if not.
Until now there was no way of holding inline formula other than concise (although JUMBO5.0 is
capable of reading them). We now extend formula.xsd to incorporate this through the attribute
"inline" which requires the use of the "convention" attribute. The contents of inline are
purely textual. It can be used with or without atomArray or concise but there is no
guarantee that it can be interpreted as a meaningful chemical formula or that there is consistency.
In some cases a document supplies several formula representations (e.g. the IUCr's CIF). In this
case a molecule (or crystal) element might contain several formula children. The semantics of which
to use are application dependent.
fragment is a container for a molecule, potentially to be joined
to other fragments. In addition there may be fragmentLists which represent branches
from the molecule. There may also be a join child which is normally only found
if there is a @countExpression.
Supports compund identifiers such as IChI. At present uses the V0.9 IChI XML representation verbatim but will almost certainly change with future IChIs.
The inclusion of elements from other namespaces causes problems with validation. The content model is deliberately LAX but the actual elements in IChI will fail the validation as they are not declared in CML.
For simple scalar values the value attribute can be used with empty content. Where an identifier has several components a series of label elements can be used.
EXPERIMENTAL. join will normally use atomRefs2 to identify 2 R atoms
(i.e. elementType="R" that should be joined. The atoms to which the R atoms
are attached are then joined by a new bond and the R groups are then deleted. It is currently
an error if these atoms already have a connecting bond.
A label can be used to identify or distinguish elements, add keywords or classifications and similar processes. It is usually interpretable by domain-aware humans (e.g. C3'-endo, but not a34561). It is usually either built in a semantically rich fashion (e.g. C2'-alpha-H) or belongs to a controlled vocabulary. It is possibly accessed by software in a domain-specific manner. It differs from description which is free text. The distinction between titles, names and labels is fuzzy, but we think this is worth making. Labels may be necesssary to identify objects within programs, while names are more likely to be reserved for database searches. Titles are likely to be freer text and not recommended for precise object retrieval.
Labels should not contain whitespace. Punctuation marks are often necessary, but should not be gratuitously used. Punctuation clashing with XML character entities should be avoided; if this is not possible it should be escaped.
Lattice is a general approach to describing periodic systems.
It can have variable dimensionality or periodicity, and could be finite.
_lattice_ is more general than _crystal_ in cmlCore which is used primarily for reporting
crystallographic experiments.`A lattice can be described by latticeVectors, cell axes
and angles, or metric tensors, etc. (only axes/angles are allowed under crystal). The dimensionality is enforced through a _system_ parent element.
<--
<h:div class="summary">A number of lattice vectors equal to the dimensionality. Note that some vectors may give rise to periodicty while others do not. Thus a surface can be described by two vector in the plane of the surface and one perpendicular to them.</h:div>
-->
a lattice can be represented by 1-3 non-linearly
dependent latticeVectors. If the dimensionality is less than 3 latticeVectors are the
preferred method. Similarly, if the axes show a mixture of periodicity and non-periodicity
latticeVectors can support this. The number of periodic vectors must correspond with
the periodicity attribute on a system element.
The vector must not be zero and units must be given. (Zero vectors must not be
used to reduce dimensionality).
A lattice vector defaults to periodic.
.
Any or all of the axes may be periodic or aperiodic. An example
could be a surface where 2 periodic axes (not necessarily orthogonal) are used to describe
the coordinates in the surface, perhaps representing lattice vectors of a 3D crystal or
2D layer. The third vector is orthogonal and represents coordinates normal to the surface.
In this case only the direction, not the magnitude of the vector is important.
This is either an experimental measurement or used to build up internal coordinates (as in a z-matrix) (only one allowed). We expect to move length as a child of _molecule_ and remove it from here.
Semantics are similar to XLink, but simpler and only a subset is implemented.
This is intended to make the instances easy to create and read, and software
relatively easy to implement. The architecture is:
A single element (link) used for all linking purposes.
The link types are determined by the type attribute and can be:.
locator. This points to a single target and must carry either a ref or href attribute.
locator links are usually children of an extended link.
arc. This is a 1:1 link with both ends (from and to) defined.
extended. This is usually a parent of several locator links and serves
to create a grouping of link ends (i.e. a list of references in documents).
Many-many links can be built up from arcs linking extended elements
All links can have optional role attributes. The semantics of this are not defined;
you are encouraged to use a URI as described in the XLink specification.
There are two address spaces:
The href attribute on locators behaves in the same way as href in
HTML and is of type xsd:anyURI. Its primary use is to use XPointer to reference
elements outside the document.
The ref attribute on locators and the from and to
attributes on arcs refer to IDs (without the '#' syntax).
Note: several other specific linking mechanisms are defined elsewhere in STM. relatedEntry should be used in dictionaries, and dictRef
should be used to link to dictionaries. There are no required uses of link in STMML
but we have used it to map atoms, electrons and bonds in reactions in CML
Relation to XLink.
At present (2002) we are not aware of generic XLink
processors from which we would benefit, so the complete implementation brings little
extra value.
Among the simplifications from Xlink are:
type supports only extended, locator and arc
label is not supported and ids are used as targets of links.
show and actuate are not supported.
xlink:title is not supported (all STM elements can have a title
attribute).
xlink:role supports any string (i.e. does not have to be a namespaced resource).
This mechanism can, of course, still be used and we shall promote it where STM
benefits from it
The to and from attributes point to IDs rather than labels
The xlink namespace is not used
It is not intended to create independent linkbases, although some collections of
links may have this property and stand outside the documents they link to
A generic container with no implied semantics. It just contains things and can have attributes which bind conventions to it. It could often act as the root element in an STM document.
Usage is now standardized with map as the container and link as the individual links. The links are often effectively typed pointers to other parts of the document. The type can be set for all links by the 'fromType' and 'toType' attributes, either in the map, which then applied to all links by default, or in individual links, when it overrides the map setting. Since ids may not be unique within a document the refs can be given context with the 'fromRef' and 'toRef' attributes in the map element. If more than one context is used it may be better to use multiple maps. The role of map, and its relationship to RDF is still being developed.
Currently (2005) map has primarily been used to map atoms between reactants and products, but we also expect shortly to extend it to peak assignments and several otherr areas. A map consists of a number of links, which can be directional, relating two elements through their ids. Reference is through the mandatory 'to' and 'from' attributes which must point to existing id attributes on elements. The type of the dereferenced element can be specified in 'toType' and 'fromType' which, while redundant, is an aid to software and acts as a check on referential type integrity.
In principle any element can be linked to any other, with 1:1, 1:n, and n:m topology. We expect maps to be used for precise chemical concepts such as reactions, peak assignments, electron management, molecular superpositions, etc. and that these are supported by bespoke code. For other links, especially with complex topology, users should consider whether RDF may be more appropriate.
In some cases partial mapping is known (e.g. one set of atoms maps to another set), but the precise links are unknown. (This is not the same as n:m mapping where n*m precise links would be expected). In some cases there may be objects such as atomSets or peakGroups which could be linked to support this. Alternatively the 'fromSet' and 'toSet' attributes can be used to hold a list of ids. Thus from='a1 a2' to='b3 b4' might imply that there were two precise links (either {a1=>b3, a2=>b4} or {a1=>b4, a2=>b3}). This is most likely to be used in intermediate documents where more precise semantics can be added later. The ids must all refer to elements of the same type. Note that a 'to' link referencing a single atomSet (toType='atomSet') is not the same as a 'toSet' of toType='atom' with multiple atomIds. The first would require an 'atomSet' element in the document; the second would not. The precise semantics such as the order of ids are application-dependent. If the order is known in both the toSet and fromSet then individual links should be used rather than adding the burden of deconstruction on the implementer.
2005-06-18: added typing and role and updated docs.
By default matrix represents
a rectangular matrix of any quantities
representable as XSD or STMML dataTypes. It consists of
rows*columns elements, where columns is the
fasting moving index. Assuming the elements are counted from 1 they are
ordered V[1,1],V[1,2],...V[1,columns],V[2,1],V[2,2],...V[2,columns],
...V[rows,1],V[rows,2],...V[rows,columns]
By default whitespace is used to separate matrix elements; see
array for details. There are NO characters or markup
delimiting the end of rows; authors must be careful!. The columns
and rows attributes have no default values; a row vector requires
a rows attribute of 1.
matrix also supports many types of square matrix, but at present we
require all elements to be given, even if the matrix is symmetric, antisymmetric
or banded diagonal. The matrixType attribute allows software to
validate and process the type of matrix.
In some cases this may be a simple textual description or reference within a controlled vocabulary. In others it may describe the complete progress of the reaction, including topological or cartesian movement of atoms, bonds and electrons and annotation with varying quantities (e.g. energies).
For named reaction mechanisms ("Diels-Alder", "ping-pong", "Claisen rearrangement", etc.) the name element should be used. For classification (e.g. "hydrolysis"), the label may be more appropriate.
In more detailed cases the mechanism refers to components of the reaction element. Thus bond23 might be cleaved while bond19 is transformed (mapped) to bond99. The mechanismComponent can be used to refer to components and add annotation. This is still experimental.
IUPAC Compendium of Chemical Terminology 2nd Edition (1997) describes a mechanism as:
A detailed description of the process leading from the reactants to the
products of a reaction, including a characterization as complete as possible
of the composition, structure, energy and other properties of reaction
intermediates, products and transition states. An acceptable mechanism of
a specified reaction (and there may be a number of such alternative mechanisms
not excluded by the evidence) must be consistent with the reaction
stoichiometry, the rate law and with all other available experimental data,
such as the stereochemical course of the reaction. Inferences concerning
the electronic motions which dynamically interconvert successive species
along the reaction path (as represented by curved arrows, for example) are
often included in the description of a mechanism.
It should be noted that for many reactions all this information is not
available and the suggested mechanism is based on incomplete experimental
data. It is not appropriate to use the term mechanism to describe a
statement of the probable sequence in a set of stepwise reactions. That
should be referred to as a reaction sequence, and not a mechanism.
CMLReact provides reactionScheme and annotions to describe the reaction sequence and both it and mechanism could co-occur within a reactionScheme container.
An information component within a reaction mechanism.
Information components can represent both physical constituents of the reaction or abstract concepts (types of bond cleavage, thermodynamics, etc.). There are several ways that components of the reaction can be annotated and/or quantified. One approach will be to refer to specific bonds and atoms through their ids and use mechanismComponent to describe their role, properties, etc. Another is to use mechanismComponent to identify types of bond formed/broken without reference to actual atoms and bonds (initially through the name element). Yet another will be to include information on the reaction profile.
A general container for metadata, including at least
Dublin Core (DC) and CML-specific metadata
In its simple form each element provides a name and content in a similar
fashion to the meta element in HTML. metadata may have simpleContent
(i.e. a string for adding further information - this is not controlled).
MetadataLists can have local roles (e.g. a bibliographic reference could be a single meteadatList with, say, 3-6 components). The role attribute is used in an uncontrolled manner for this. MetadataLists can also be nested, but metadata and metadataList children should not occur on the same level of the hierarchy.
Many programs are based on discrete modules which produce chunks of output. There are also conceptual chunks such as initialisation, calculation and summary/final which often have finer submodules such as cycle, iteration, snapshot, etc. There is no controlled vocabulary but a typical structure is shown in the example. One of the challenges of CCML is to find communality between different programs and to use agreed abstractions for the modules.
molecule is a container for atoms, bonds and submolecules along
with properties such as crystal and non-builtin properties. It should either
contain molecule or *Array for atoms and bonds. A molecule
can be empty (e.g. we just know its name, id, etc.)
"Molecule" need not represent a chemically meaningful molecule. It
can contain atoms with bonds (as in the solid-sate) and it could
simply carry a name (e.g. "taxol") without formal representation
of the structure. It can contain "sub molecules", which are often
discrete subcomponents (e.g. guest-host).
Molecule can contain a <list> element to contain data
related to the molecule.
Within this can be string/float/integer and other nested lists
Normally molecule will not contain fragment or fragmentList
Revised content model to allow any order of lengths, angles, torsions 2003-01-01..
Added role attribute 2003-03-19..
2006-05-21. PMR changed content model to (A|B|C...)*
moleculeList can contain several molecules.
These may be related in many ways and there is are controlled
semantics. However it should not be used for a molecule
consisting of descendant molecules for which molecule
should be used.
A moleculeList can contain nested moleculeLists.
name is used for chemical names (formal and trivial) for molecules and also for identifiers such as CAS registry and RTECS. It can also be used for labelling atoms. It should be used in preference to the title attribute because it is repeatable and can be linked to a dictionary.
Constraining patterns can be described in the dictionary and used to validate names.
An object which might occur in scientific data or narrative.
Deliberately vague. Thus an instrument might be built from sub component objects, or a program could be composed of smaller modules (objects). object could be used to encapsulate graphical primitives (e.g. in reaction schemes, drawings of apparatus, etc.). Unrestricted content model.
A container for any events that need to be recorded, whether planned or not. They can include notes, measurements, conditions that may be referenced elsewhere, etc. There are no controlled semantics.
Experimental. An operator acts on one or more arguments (at present the number is fixed by the type). The formulation is reverse Polish so the result (with its dataType) is put on a stack for further use.
A parameter is a broad concept and can describe numeric quantities, objects,
keywords, etc. The distinction between keywords and parameters is often fuzzy.
("MINIM" might mean "minimize", while "MINIM=3" might require three iterations
to be run. It may help to think of control keywords as boolean parameters.
Numeric parameters can describe values in molecules, forcefields or other
objects. Often the parameters will be refined or otherwise varied during the
calculation. Some parameters may be fixed at particular values or relaxed at different
stages in the calculation. Parameters can have errors, gradients and other indications
of uncertainty.
String/character parameters are often abbreviated in program input, and this
is supported through the regex and ignoreCase attributes.
Parameters will usually be defined separately from the objects and use the
ref attribute to reference them.
Parameters can be used to describe additional constraints. This will probably
require the development of a microlanguage and until then may use program-specific
mechanisms. A common approach will be to use an array of values (or objects) to
represent different input values for (parts of) the calculation. Thus a conformational
change could be specified by an array of several torsion angles.
A parameter will frequently have a dictRef pointing to a dictionary
which may have more information about how the parameter is to be used or the values
it can take.
The allowable content of parameters may be shown by a "template"
in the appinfo; this is stil experimental.
particles have many of the characteristics of atoms
but without an atomic nucleus. It does not have an elementType and cannot be
involved in bonding, etc. It has coordinates, may carry charge and might have a
mass. It represents some aspect of a computational model and should not be used
for purely geometrical concepts such as centroid. Examples of particles are
"shells" (e.g. in GULP) which are linked to atoms for modelling polarizability
or lonepairs and approximations to multipoles. Properties such as charge, mass
should be scalar/array/matrix children.
Distinguish between peakList (primarily a navigational container) and peakGroup where the peaks (or groups) have some close relation not shared by all peaks. All descendants must use consistent units.
2005-11-22. added atomRefs, bondRefs and moleculeRefs and deprecated
atom, bond, molecule children
Distinguish between peakList (primarily a navigational container) and peakGroup where the peaks (or groups) have some close relation not shared by all peaks. All peaks and peakGroups should use the same units.
Primarily to record couplings and other fine
structure. At present we have tested this on HNMR spectra, C13 NMR and
simple IR. We believe that other types of spectroscopy (ESR, NQR, etc) can be
represented to some extent, but there may be systems beyond the current
expressive power.
For molecules without symmetry we believe that most of the important
types of NMR coupling can be represented. Thus an atom which gives rise to
two couplings can have two child PeakStructures, and this is shown
in example1.
Where a peak is due to symmetry-related atoms there are
different couplings to symmetrical atoms. Thus in an AA'BB' system there
can be two couplings to the A atoms and we need nested peakStructures to
represent these. In this case the order of the atoms in the peak@atomRefs
maps to the order of the grandchildren. See example2.
<!−- AA'BB' where there are 2 Ha and 2 Hb with two couplings
J1 Ha ... Hb and Ha' ... Hb'
J2 Ha ... Hb' and Ha' ... Hb
-−>
<molecule id="m1">
<atomArray>
<atom id="a1" elementType="H">
<label value="Ha"/>
</atom>
<atom id="a2" elementType="H">
<label value="Ha'"/>
</atom>
<atom id="a3" elementType="H">
<label value="Hb"/>
</atom>
<atom id="a4" elementType="H">
<label value="Hb'"/>
</atom>
</atomArray>
</molecule>
<spectrum id="spectrum2" title="test peaks">
<peakList>
<!−- the ORDER of a1 and a2 is linked to the ORDER of the
grandchildren elements, i.e. a1 couples to atoms in ps11 and ps21
while a2 relates to atoms is ps21 and ps22
-−>
<peak id="p1" title="Ha" atomRefs="a1, a2"
peakShape="sharp" xUnits="unit:ppm" xValue="6.0">
<peakStructure id="ps1" type="coupling" peakMultiplicity="doublet"
value="10" units="unit:hertz">
<peakStructure id="ps11" atomRefs="a3"/>
<peakStructure id="ps12" atomRefs="a4"/>
</peakStructure>
<peakStructure id="ps2" type="coupling" peakMultiplicity="doublet"
value="2" units="unit:hertz">
<peakStructure id="ps21" atomRefs="a4"/>
<peakStructure id="ps22" atomRefs="a3"/>
</peakStructure>
</peak>
</peakList>
</spectrum>
</cml>
This represents the actual function for the potential (i.e. with explicit values) rather than the functional form, which will normally be referenced from this.
This has generic arguments and parameters rather than explicit ones. It is essentially a mathematical function, expressed currently in reverse Polish notation.
property can contain one or more children, usually scalar,
array or matrix. The dictRef attribute is
required, even if there is a single scalar child with the same dictRef. The
property may have a different dictRef from the child, thus providing an extension
mechanism.
Properties may have a state attribute to distinguish the state of
matter
reactant describes a reactant species which takes part in a reaction. Catalysts and supports are not normally classified as reactants, but this is subjective. Enzymes (or parts of enzymes) may well be reactants, as could be substances which underwent chemical change but were restored to their original state. reactant is a powerful concept as it can support stoichiometry (atom and molecule counting), mapping (for mechanisms), etc. Solvents are best contained within substanceList.
reaction is a container for reactants, products, conditions, properties and possibly other information relating to the reaction, often within a reactionList. Partial semantics exist:
name the name(s) of the reaction
reactantList (normally only one) the grouped reactants
spectatorList substances with well-defined chemistry which are involved in the reaction but do not change. Examples are side groups in proteins, cofactors, etc. The division between specattor and substance is subjective.
substance or substanceList substances present in the reaction but not classified as reactants. Examples might be enzymes, catalysts, solvents, supports, workup, etc.
condition conditions of the reaction. These may be text strings, but ideally will have clearer semantics such as scalars for temperature, etc.
productList the grouped products. This allows for parallel reactions or other semantics.
property properties (often physical) associated with the reaction. Examples might be heat of formation, kinetics or equilibrium constant.
Reaction normally refers to an overall reaction or a step within a reactionList. For a complex "reaction", such as in enzymes or chain reactions, it may be best to use reactionScheme to hold the overall reaction and a reactionList of the individual reaction steps.
A container for one or more reactions or reactionSchemes with no interrelations.
A reactionList aggregates reactions and reactionSchemes but implies no semantics. The most common uses are to create small collections of reactions (e.g. databases or publications).
A container for two or more related reactions and their relationships.
Where reactions are closely related (and often formally dependent on each other) they should be contained within the reactionStepList of a reactionScheme. The semantics which have informed this design include:
Steps within an organic synthesis.
Two or more individual (primitive) steps provding the detailed mechanism for an overall reaction.
Coupled or sequential reactions within biochemical pathways.
This design is general because "reaction" is used in several ways. A biochemical pathway (e.g. oxidation of glucose to CO2 and water) involves many coupled enzyme reactions proceeding both in parallel and in sequence. Each of these steps ("reactions" in their own right) is itself complex and can include several mechanistics steps which are themselves reactions with products, reactants, etc. reactionScheme can therefore include reactionStepLists (with more reactionScheme children) which provide a more detailed view of the individual components.
A child of reactionStepList and a container for reaction or reactionScheme.
reactionStep is always contained within reactionStepList and is designed to manage "sub-reactions" which have close relationships. These will often involve reactions which, taken together, describe a higher level reaction or reaction type. Examples are:
biochemical pathways
synthetic reaction schemes
multi-step reactions
parallel and/or coupled reactions
.
A reactionStep normally contains a single reaction or reactionScheme. It can have attributes such as yield and ratio which can be used by the parent reactionStepList.
A container for one or more related reactionSteps.
reactionStepList is always contained within reactionScheme and is designed to manage "sub-reactions" which have close relationships. These will often involve reactions which, taken together, describe a higher level reaction or reaction type. Examples are:
biochemical pathways
synthetic reaction schemes
multi-step reactions
parallel and/or coupled reactions
.
A reactionStepList contains reactionSteps (each of which contains reactions and/or reactionSchemes (e.g. where part of the process is known in greater detail)). It may not directly contain child reactionStepLists.
The child reactionSteps can have attributes such as yield and ratio which describe the relationship of the component steps.
Guidance on use:
reactionScheme describes a complex of reactions with metadata, one (or more) overall reactions and a reactionStepList with the overall component reactions.
reactionStepList aggregates and structures the individual subreactions.
reactionList is a container for reactions and reactionSchemes with no semantics (e.g. a book or database of selected reactions).
This describes the set(s) of bonds and atoms involved in the reaction. The semantics are flexible, but a common usage would be to create atomSet(s) and bondSet(s) mapping to groups which undergo changes.
Under development. A subdivision of the system to which special
protocols or properties may be attached. Typical regions could be defined by the
presence of atoms belonging to an atomSet or geometrical boundaries.
A region element will not always contain other elements,
but may have references from other elements. It may create a protocol, e.g. atoms
within a region might be replaced by a continuum model or be subject to a field.
Semantics yet to be determined.
Regions can be created by the unions of two or more regions. This allows a region
to be built from a series of (say) spheres or boxes filling space.
An entry related in some way to a dictionary entry.
The range of relationships is not restricted but should include parents, aggregation, seeAlso and so on. DataCategories from ISO12620 can be referenced through the namespaced mechanism.
The sample should contain information on what things were in the sample and their roles. It can include molecule, substance and substanceList. Typical rolos include solvent, mulling agents, salt disks, molecular supports, etc. but should not cover apparatus or conditions.
scalar holds scalar data under a single
generic container. The semantics are usually resolved by
linking to a dictionary.
scalar defaults to a scalar string but
has attributes which affect the type.
scalar does not necessarily reflect a physical object (for which
object should be used). It may reflect a property of an object
such as temperature, size, etc.
Note that normal Schema validation tools cannot validate the data type
of scalar (it is defined as string), but that a temporary schema
can be constructed from the type and used for validation. Also the type
can be contained in a dictionary and software could decide to retrieve this
and use it for validation.
Objects are often present during a reaction which are not formally involved in bond breaking/formation and which are not modified during the reaction. They may be catalysts, but may also be objects which in some way constrain or help the reaction to take place (surfaces, micelles, groups in enzyme active sites, etc.). In some cases molecules present in a reaction mixture may act as spectators in steps in which they are not transformed.
The spectrum construct can hold metadataList, sample (which can contain molecule), conditionList (mainly for physical/chemical conditions, not instrumental), spectrumData for the actual data and instrumental settings/procedure and peakList for the assigned peaks. This approach puts the spectrum as the primary object of interest. It could also be possible to make spectrum a child of molecule (although a reference using ref might be preferable).
This is primarily to record the data in interchangeable format and machine and manufacturers settings and can include other MLs in this area (AniML, SpectroML, etc.). We recommend ASCII representations of data and this is the only format that CMLSpect implementers have to support, but we also allow for the carriage of JCAMP and other data (in ML wrappers such as AniML). All numeric data should carry units and dictionary references if possible to allow for semantic interoperability.
substance represents a chemical substance which is deliberately very general. It can represent things that may or may not be molecules, can and cannot be stored in bottles and may or may not be microscopic. Solutions and mixtures can be described by _substanceList_s of substances. The type attribute can be used to give qualitative information characterising the substance ("granular", "90%", etc.) and _role_ to describe the role in process ("desiccant", "support", etc.). There is currently no controlled vocabulary. Note that reaction is likely to have more precise semantics. The amount of a substance is controlled by the optional _amount_ child.
Deliberately very general - see substance. substanceList is designed to manage solutions, mixtures, etc. and there is a small enumerated controlled vocabulary, but this can be extended through dictionaries.
substanceList can have an amount child. This can indicate the amount of a solution or mixture; this example describes 100 ml of 0.1M NaOH(aq). Although apparently longwinded it is precise and fully machine-interpretable
symmetry provides a label and/or symmetry operations for molecules
or crystals. Point and spacegroups can be specified by strings, though these are not
enumerated, because of variability in syntax (spaces, case-sensitivity, etc.),
potential high symmetries (e.g. TMV disk is D17) and
non-standard spacegroup settings. Provision is made for explicit symmetry operations
through <matrix> child elements.
By default the axes of symmetry are defined by the symbol - thus C2v requires
z to be the unique axis, while P21/c requires b/y. Spacegroups imply the semantics
defined in International Tables for Crystallography, (Int Union for Cryst., Munksgaard).
Point groups are also defined therein.
The element may also be used to give a label for the symmetry species (irreducible
representation) such as "A1u" for a vibration or orbital.
The matrices should be 3x3 for point group operators and 3x4 for spacegroup operators.
The use of crystallographic notation ("x,1/2+y,-z") is not supported - this would
be <matrix>1 0 0 0.0 0 1 0 0.5 0 0 1 0.0<matrix>.
The default convention for point group symmetry is Schoenflies and for
spacegroups is "H-M". Other conventions (e.g. "Hall") must be specfied through
the convention attribute.
This element implies that the Cartesians or fractional coordinates in a molecule
are oriented appropriately. In some cases it may be useful to specify the symmetry of
an arbitarily oriented molecule and the <molecule> element has the attribute
symmetryOriented for this purpose.
It may be better to use transform3 to hold the symmetry as they have fixed shape and
have better defined mathematical operators.
By default table represents a rectangular table of any simple quantities
representable as XSD or CML dataTypes. There are three layouts, columnwise, rowwise and
without markup. In all cases it is essential that the columns, whether explicit or
otherwise, are homogeneous within the column. Also the metadata for each column must
be given explicitly.
<ul>
<li> columns:
There is a single arrayList child containing (homogeneous) child
elements (array or listof
size rows data. This is the "normal" orientation of data tables
but the table display could be transposed by XSLT transformation if required.
Access is to columns, and thence to the data within them. DataTyping, delimiters,
etc are delegated to the arrays or lists, which must all be of the same size.
</li>
<li> rows: with explicit <tt> trow</tt>s. The metadata is carried in a <tt> theader</tt>
element of size <tt> cols</tt>. Within each trow the data are contained in tcells</li>
<li> content: The metadata is carried in a <tt> theader</tt>
element of size <tt> cols</tt>. data are contained in a single <tt> tableContent</tt>
with columns moving fastest. Within the content the data are whitespace (or delimiter) separated.</li>
</ul>
For
verification it is recommended that tables carry rows and columns attributes.
The type of the tables should also be carried in a <tt> tableType</tt>attribute>
Validity contraints (XPath expression in table context)
type
@tableType
@rows
actual rowCount
@columns
actual columnCount
tableHeader
arrayList
tableRowList
tableContent
column based
columnBased
recommended
./arrayList/@size or arrayList/*[self::array or self::list]/@size
optional
./arrayList/@size or count(arrayList/*[self::array or self::list])
forbidden
required
forbidden
forbidden
row based
rowBased
recommended
./tableRowList/@size or count(tableRowList/tableRow)
recommended
count(tableHeader/tableHeaderCell) or count(tableRowList/tableRow/tableCell)
tableCell
is a data container of the table and only occurs as a child of tableRow.
Normally it contains
simpleContent, but may also contain a single child element (which could itself have
complex or mixed content). However tableCell should NOT directly contain
multiple children of any sort or mixed content. (It is declared as mixed
content here to allow either text or element content, but not both.).
The metadata for tableCells must be declared in a tableHeader/tableHeaderCell
system
This only occurs as simpleContent or a tableContent elements.
It contains table/@rows * table/@columns items arranged rowwise
(i.e. columns is fastest
moving). Metadata for columns must be defined in tableHeader.
The items of the
table are ASCII strings. They can be separated by whitespace
or by a defined single character delimiter as in array. The
data must be rectangular and each implicit column must have consistent semantics.
It can be used to hold CSV-like data (indeed CSV data can be directly entered as
long as there are no quoted commas in which cas a different delimiter (or
the safer tableRowList) should be used. Unlike tableRowList or arrayList (both of which can hold ASCII
strings or XML elements, tableContent can only hold strings.
Used for rowBased or contentBased tables when it is mandatory.
Contains the metadata as tableHeaderCells which should match the (implicit) columns
in number and semantic type. It is forbidden for arrayList tables as each
array/list contains the metadata.
Only used when in rowBased or contentBased tables,
and then as a direct child of tableHeader.
There must be as many tableHeaderCells as there are
implicit columns in tableRowList or tableContent. These cells carry the metadata
and/or semantics for each column. These are similar to the attributes in array
but without the lsist of minValue, errors etc. However they can (and should)
carry all the units metadata.
A direct child of tableRowList containing tableCells.
At present all tableRows in a tableRowList must have the same count of tableCells
and their semantics must correspond to the tableHeader in the table. No cells can be omitted
and there is no spanning of cells. There is no need for a size attribute as the count is simply
count(tableCell).
tcell may either carry the header information
for a column OR be a dat container of the table. Normally it contains
simpleContent, but may also contain a single child element. It should
NOT contain multiple children of any sort.
A scientific unit. Units are of the following types:
SI Units. These may be one of the seven fundamental types
(e.g. meter) or may be derived (e.g. joule). An SI unit is
identifiable because it has no parentSI attribute and will have
a unitType attribute. 2005-122-17 - this may be obsolete; PMR
nonSI Units. These will normally have a parent SI unit
(e.g. calorie has joule as an SI parent).
Constructed units. These use a syntax of the form:
This defines a new unit (g.s-1) which is composed from two
existing units (units:g and siUnits:s) to create a new unit. The
conversion to SI is computed from the two child units and may be
added as a 'multiplierToSI' attribute. Only siUnits or units with
'multiplierToSI' can be used as child units; 'constantToSI cannot
be used yet. If the new unit points to a unitType then the dimension
can be checked. Thus if the published dimension of massPerTime does not
agree with mass.length-1 an error is throwable.
Alternatively a new unitType can be added as a child.
The relationship of a unit to its SI parent is potentially complex and
inconsistencies may arise. The following are available:
parentSI. This points to the ID of a parent SI unit. If this ID is the
same as the current unit the implication is that this is an SI unit.
isSI. a boolean indicating whether the current unit is SI.
multiplierToSI and constantToSI. If these are 1.0 and 0.0 (or missing)
the implication is that this unit is SI. However this is fragile as units can
be defined without these attributes and a unit could coincidentally have
no numeric differences but not be an SI unit.
2003:04-09 Description or parentSI attribute enhanced.
2006:03-21 Added metadata and metadataList to content.
Usually forms the complete units dictionary
(along with metadata). Note: this used to hold both units and unitTypes
(though in separate files). This was unwieldy and unitTypeList has been
created to hold unitTypes. Implementers are recommended to change
any unitList/unitType to unitTypeList/unitType
2005-12-15. PMR. added namespace
and dictionaryPrefix.
2005-12-17. PMR. added siNamespace .
2006-01-28. PMR. deprecated use for holding unitType.
Mandatory for SI Units, optional for nonSI units since they should be able to obtain this from their parent. For complex derived units without parents it may be useful.
Used within a unitList
Distinguish carefully from unitsType
which is primarily used for attributes describing the units that elements
carry
2006-02-06: PMR. Added preserve and symbol attributes.
Usually forms the complete unitTypes dictionary
(along with metadata). Note: unitTypes used to be held under unitList, but
this was complicated to implement and unitTypeList makes a clean separation.
A container for all information relating to the x-axis (including scales, offsets, etc.) and the data themselves (in an array). Note: AniML uses "xValues" so avoid confusion with this.