Deep Citation

Deep Citation is a machine learning (ML) based framework developed by OpenKIM that finds all articles that cite the primary source(s) of an interatomic potential (IP) and then attempts to determine whether each citing article actually used the IP in computations or is only providing it as a background reference. This information can be useful to researchers who are considering the IP for a given application, since it allows them to see how it was used in the past.

Deep Citation workflow

Deep Citation makes its determination using natural language processing (NLP) to analyze the the citation contexts in the citing article to determine their intention. For example, a citing context like: "The Tersoff bond-order potential was employed to describe interatomic forces [X]…" would translate to a "USED" determination, whereas something like "A variety of potentials have been developed for silicon systems, such as a the Tersoff potential [Y], …" would translate to "NOT USED". When full text of the citing article is not available, Deep Citation instead uses information from the title and abstract of the IP primary source(s) and the citing article. See the schematic diagram of the Deep Citation workflow above.

Tersoff T2 usage word cloud The usage information is presented on each IP’s model page in openkim.org. A bar chart shows the number of articles that cited the IP per year. Each bar is divided into green (articles that USED the IP) and blue (articles that did NOT USE the IP). In addition, a word cloud is displayed that is generated from the abstracts of the IP principle source(s) and the citing articles that were determined to have USED the IP in order to provide users with a quick sense of the types of physical phenomena to which this IP is applied (e.g., see the word cloud for the Tersoff T2 potential for silicon on the right).

A list of all articles that have cited the IP is provided with the Deep Citation determination on whether the IP was USED (marked by USED icon) or NOT USED (marked by NOT USED icon) in the article. These determinations are accompanied by an indication of their level of certitude (see figure on right): Sampling of articles citing the Tersoff T2 potential

  • "definite" indicates that the determination is based on human labeling (i.e., a human reviewing the paper determined whether or not it used the IP).
  • "high confidence" and "low confidence" ratings are given when the determination is made by the Deep Citation ML algorithm and indicate the level of confidence in the result.

Users are encouraged to correct Deep Citation errors in determination by clicking the speech icon 💬 next to a citing article and providing updated information. This will be integrated into the next Deep Citation learning cycle, which occurs on a regular basis.

Deep Citation search bar

Users can filter the list of articles to only show those determined to have used the IP, or show all articles. Typing text in the search box and clicking "Search" will highlight any articles that contain that text in their title or abstract. The displayed list can be sorted by:

  • usage: which will organize article from the most likely to have used the IP (USED (definite)) down to the most likely to not have used the IP (NOT USED (definite)). This is the default.
  • publication date: from newest to oldest, or vice versa
  • popularity: from most cited to least, or vice versa

Users are invited to contact OpenKIM with comments about Deep Citation or requests for new features at support@openkim.org.


OpenKIM acknowledges the support of the Allen Institute for AI through the Semantic Scholar project and scientific publishers including ACS, AIP, IOP, RSC, and Taylor & Francis for providing citation information and full text of articles, which are used to train the Deep Citation ML algorithm.