A Practical Guide to Raw MongoDB Queries

All computation (Test and Verification Check) results computed by the KIM Pipeline, as well as any reference data archived in OpenKIM, are accessible through a publicly accessible mongo database. For simple crystals and specific Test results, a simplified query interface is available as a web endpoint, and as a Python package and LAMMPS commands. This functionality is described here. We are working to expand simplified queries for arbitrary crystals.

This page serves as a guide to querying the database directly using MongoDB syntax. At this time, it is the only way to query for properties of complex crystals under the Crystal Genome (XtalG) framework, and will remain the most flexible and powerful querying mechanism even as simplified queries are expanded.

Raw queries are accessible through a web interface at https://query.openkim.org/raw, which is the best place to start experimenting. Whenever a query is performed through the web interface, the bottom of the page will show how to perform the same query with other tools/languages, such as Curl, GET, Python, d3, and jQuery. Throughout your usage, the MongoDB documentation will be an invaluable resource to learn about search operators and, for even more advanced usage, aggregration pipeline operations.

Let us consider the task of constructing a zero-temperature thermodynamic convex hull for a binary system for the potential EAM_Dynamo_ZopeMishin_2003_TiAl__MO_117656786760_006. A Jupyter notebook and Python module using these queries to construct a convex hull diagram is included in the OpenKIM Binder demo, where you can construct the convex hull and view the source code.

All aforementioned types of data (Test Results, Verification Results, and Reference Data) are contained in the data database, which must be set in the corresponding field in the web form. This database also contains Errors. The type of data is found in the meta.type key, and the possible values are "tr", "vr", "rd", and "er". Other than errors, all entries in data consist of a KIM Property Instance with additional metadata.

To construct a convex hull, we will need the potential energy of all computed structures containing Ti and/or Al, and no other elements. A list of all properties in OpenKIM is found on https://openkim.org/properties. By inspecting the list of properties, we can see that binding-energy-crystal is the property reporting the potential energy of arbitrary crystals. To contruct a convex hull, in addition to the energies, we will need the stoichiometry of each structure as well. This can be obtained by postprocessing the prototype-label key, which contains the AFLOW Prototype Label of the crystal. The first underscore-separated field of the prototype label is the stoichiometry (e.g. "A2B" for Al2Ti – the species are always alphabetized). We will also need to get the potential energy, and the binding-potential-energy-per-formula key is most conveninent for constructing the convex hull.

To see how the database entries are structured, it's helpful to start by looking at a single entry for the property we are interested in. To do so, we query for Test Results containing the property we are interested in, and use the "limit" field to avoid querying for a large number of entries. If we did not know how to specify the property, we could have queried for a single Test Result with no other restrictions to see that the property must be specified using its full ID in the property-id key, but we will skip that step.

Example query

From looking at the query result, we can see that we need the source-value subkey from each material property key (of course, we could have also learned this from a careful reading of the Properties Framework documentation). By examining the metadata, we can see that the model name is contained in the meta.subject.extended-id key. For this model, we do not need to filter by species because the model only supports Ti and Al, but for other models we may, so we add a filter for the stoichiometric-species property key as well, using a rather unintuitive double-negative combination of mongo operators to search for fields matching one or more desired species, but no others: "stoichiometric-species.source-value":{"$not":{"$elemMatch":{"$nin":["Al","Ti"]}}}. See the MongoDB documentation for more info. We also filter the results to only the keys we need for constructing the convex hull (stoichiometric-species is needed to distinguish pure Al from pure Ti). Setting the "limit" field to 0 in order to query for all results, the final query looks like this:

Example query

This information is sufficient to construct a convex hull. We could also build a convex hull from reference data by removing the model name criterion and searching for items of type "rd".

We recommend the approach outlined here – starting with basic general queries and incrementally getting more specific – to develop raw queries for your own applications, especially for querying XtalG. results. Notably, the prototype-label and stoichiometric-species keys are present in every XtalG Test Result.