# KIM Properties Framework

## 1 Overview

The OpenKIM system includes a collection of Tests, Models, Predictions and Reference Data. A Test is a computer program that couples with a Model (i.e. an interatomic potential) to generate one or more Predictions, each of which is associated with a specific material property. In turn, every material property is associated with a Property Definition that is created by a developer and includes a formal definition stored in a standardized format (see Section 2.2). A Prediction is thus a realization of a Property Definition (referred to as a Property Instance) for a specific case. Similarly, an item of Reference Data is a Property Instance obtained from an experiment or a first principles calculation. Thus, every Property Instance is either a Prediction or an item of Reference Data.

A developer interested in contributing a new Test or Reference Data must first determine whether a suitable Property Definition already exists in OpenKIM by searching the properties page on https://openkim.org. If so, they proceed to use the appropriate definition when writing their Test or uploading the Reference Data. Otherwise, in consultation with the KIM Editor and developers of similar properties, they determine whether any of the existing Property Definitions can be adapted to their need or a new Property Definition is warranted. In the event that a Property Definition is adapted due to corrections or new requirements, all existing Tests and Reference Data associated with it must be revised with a corresponding version update.

## 2 KIM Properties

### 2.1 KIM Property Information

Every property in KIM will be associated with the following information:

1. The name and contact information of the original "contributor" of the property and the "maintainer" who is currently in maintaining it.

2. A Property Definition file in EDN format (Section 2.2) which defines the property and from which a template can be generated for Test developers.

3. A file containing documentation about the property, including a detailed explanation of the property and the variables that appear in the Property Definition.

4. A Wiki-style web page which provides supplemental documentation of the property and enables community involvement and discussion in its maintenance and evolution. This should include:

1. A list of interested users involved in the definition and maintenance of the property.

2. A list of tags specified by the developers that characterize the property. (The openkim.org system will provide recommendations for tags and synonyms based on the property documentation.) Existing nomenclature will be adopted when possible, such as that of the IUPAC (International Union of Pure and Applied Chemistry) available at https://iupac.org/.

5. Two validators to verify the correctness of a Property Instance. (1) A “Definition Validator” ensures that the Property Instance is valid EDN and conforms to the Property Definition. (2) A “Physics Validator”, provided by the property developer, ensures that the Property Instance contains physically acceptable values.

### 2.2 KIM Property Definition File Format

A Property Definition is stored in a subset of the EDN format as described below. In the following discussion, a map is an unordered set of key-value pairs akin to Perl’s hash, Python’s dictionary, and Java’s Hashtable. A key is a string. Key names can only include lower-case alphanumeric characters and dashes. The names are arbitrary and set by the developer to reflect the meaning of the key. A value is a string, boolean, or a vector of integers and ":" strings. A ; character encountered outside of a string indicates the start of a comment. The ; and all subsequent characters to the next newline are ignored.

Note about strings: In strings, a backslash should be double-escaped. Otherwise it is interpreted as a special character (such as tabs \t, newlines \n, etc.). For example, if the latex notation \cos\theta is included in a string, it should be written as \\cos\\theta.

A Property Definition must contain the following required key-value pairs:

property-id
A string containing the unique ID of the property. The Property ID conforms to the Tag URI Scheme as described in RFC 4151:
tag:<email-address>,<date>:property/<property-name>

The fields appearing within <...> stand in for text as defined below.

<property-name> is restricted to lowercase alphanumeric characters and the dash character (“-”).

<date> is the date of establishment of the property in “yyyy-mm-dd” format.

<email-address> is the e-mail address of the property contributor. A contributor has several options available. They may use their own e-mail address, their openkim.org username followed by “@noreply.openkim.org”, an openkim.org organization name followed by “@noreply.openkim.org”, an openkim.org user or organization’s UUID followed by “@noreply.openkim.org”, or in agreement with the KIM Editor they may use “staff@noreply.openkim.org”. The e-mail address must be in lowercase characters for the range of A-Z and cannot contain a plus (“+”) character, but otherwise does not have character restrictions if it is a valid e-mail address. Several examples follow:

(d) A UUID of “53584e2a-3caf-446f-ba11-b843d3d24a3a” corresponding to a user or organization in openkim.org: <email-address> = “53584e2a-3caf-446f-ba11-b843d3d24a3a@noreply.openkim.org”


property-title
A string containing a one-line title for the property. The title will be used in citations of the property. The title should not include an ending period.
property-description
A string containing a brief description of the property.

The required fields list above are followed by an unordered set of key-map pairs. Each key is associated with a map which must contain the following standard keys-value pairs:

type
A string defining the variable type that can be set to one of the following: "string", "float", "int", "bool", or "file".
has-unit
A boolean that indicates whether the variable value is physically-dimensioned and therefore has a physical unit or not. It can be set to either true or false.
extent
An EDN vector specifying whether the variable is a scalar or an array (EDN vector) of a specified extent. It can be set to either an empty vector [] to represent a scalar, or the extent of the array with known dimensions specified and unknown dimensions indicated by a string containing a colon character, ":". For example, [":"], [3,3], [":",2,":"]. It is recommended to store arrays in a by-row ordering for improved readability, e.g., store the coordinates of 10 atoms as [10,3] as opposed to [3,10].
required
A boolean that indicates whether the variable must be reported in every Property Instance of the property or not. It can be set to either true or false.
description
A string which provides an explanation of what the variable is intended to represent.

Below is an example of a Property Definition for the cohesive energy relation of a cubic crystal.

## 3 KIM Property Instances

Property Instances are either Predictions or items of Reference Data and must conform to the specification in the associated Property Definition. A Property Instance is stored in a subset of the EDN format as described in Section 2.2. Multiple Property Instances in a file may optionally be contained within an array represented by a start bracket ([) at the beginning, and an end bracket (]) at the end of the file:

[
{
⋮
Property Instance 1
⋮
}
{
⋮
Property Instance 2
⋮
}
]


If the brackets are not present, the Property Instances are assumed to be in an array. Multiple Property Instances can only be separated by whitespace or comments (lines beginning with a “;”).

Each Property Instance must contain the following required key-value pairs:

property-id
A string containing the Tag URI scheme for the property as described in Section 2.2.
instance-id
A positive integer identifying the instance. In the case where there are multiple Property Instances in a file, the instance-id’s cannot repeat. For a Prediction, the instance numbering must be such that the same numbering is obtained when calculations are repeated with the same input.

The required fields listed above are followed by an unordered set of key-map pairs for keys included in the Property Definition. Required keys (as indicated in the Property Definition) must be included. Each key is associated with a map containing one or more of the following key-value pairs (required keys are indicated by a star):

source-value*
A string, float, integer, boolean, or file name string (depending on the specification in Property Definition) providing the contents (value) of the variable. This variable will either be a scalar or an array of specified extent as defined in the Property Definition. Note that file names should be given relative to the Test Result, Verification Result, or Error parent directory rather than as absolute paths.
source-unit*
A string defining the physical units of the variable in notation conforming to the GNU units command. (This key is only required if the corresponding has-unit key in the Property Definition has value true.)
si-value
For numerical values, a machine-generated translation of the source-value to SI units. A Test should not provide this information.
si-unit
For numerical values,the standard SI unit corresponding to source-unit. A Test should not provide this information.
source-std-uncert-value
A float set to the numerical standard uncertainty value u. (u represents one standard deviation.)
source-expand-uncert-value
A float set to the expanded uncertainty value U defined as the “interval about the result of a measurement that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand” [1].
coverage-factor
A float set to the coverage factor k. The coverage factor k is a numerical factor which is the multiplier of the standard uncertainty in order to obtain an expanded uncertainty (i.e. U = ku).
source-asym-std-uncert-neg
A float set to the variable u associated with a standard uncertainty that is asymmetric about the key value y, with a range [y − u, y + u+].
source-asym-std-uncert-pos
A float set to the variable u+ associated with a standard uncertainty that is asymmetric about the key value yp, with a range [y − u−, y + u+].
source-asym-expand-uncert-neg
A float set to the variable U− associated with an expanded uncertainty that is asymmetric about the key value y, with a range [y − U−,y + U+].
source-asym-expand-uncert-pos
A float set to the variable U+ associated with an expanded uncertainty that is asymmetric about the key value y, with a range [y − U−,y + U+].
uncert-lev-of-confid
A float set to the level of confidence L associated with the expanded uncertainty U. The level of confidence is expressed as a percentage.
digits
An integer set to the number of reported digits.

All keys beginning with “source” are associated with the physical source-units (if applicable). The keys associated with uncertainty and precision conform to the ISO “Guide to the Expression of Uncertainty in Measurement” and the ThermoML standard notation [2].

If the source-value key is a scalar, the values of the uncertainty and digits keys must be scalars. If the source-value key’s value is an array (EDN vector), the values of the uncertainty and digits keys must be either arrays of the same extent, or scalars in which case they are taken to apply equally to all values in the source-value array.

Below is a fictitious example of a Property Instance corresponding to the cohesive-energy-relation-cubic-crystal Property Definition given in Section 2.2.

Note that, although the value of the short-name key is a scalar in this case, it is still enclosed in brackets because it was defined as an array in the Property Definition.

Metadata related to a Property Instance will be stored separately in an auxiliary document. For a Prediction, this can include details of the Test calculations. For Reference Data this can include the source citation, the origin of the data (experimental or first principles), and the type of first principles calculation performed and associated parameters needed to define the calculation.

[1] R. D. Chirico, M. Frenkel, V. V. Diky, K. N. Marsh, and R. C. Wilhoit. ThermoML – An XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data. 2. Uncertainties. J. Chem. Eng. Data, 48:1344–1359, 2003.

[2] M. Frenkel, R. D. Chirico, V. Diky, Q. Dong, K. N. Marsh, J. D. Dymond, W. A. Wakeham, S. E. Stein, E. Koenigsberger, and A. R. H. Goodwin. XML-Based IUPAC Standard for Experimental, Predicted, and Critically Evaluated Thermodynamic Property Data Storage and Capture (ThermoML). Pure Appl. Chem., 78(3):541–612, 2006.

[3] T. Kindberg and S. Hawke. The ’tag’ uri scheme. http://www.ietf.org/rfc/rfc4151.txt.