Natural Language Processing for Materials Science
It has long been a norm that researchers extract knowledge from literature to design materials. However, the avalanche of publications makes the norm challenging to follow.
Natural language processing (NLP) is efficient in extracting information from corpora. Still, it cannot discover materials not present in the corpora, hindering its broader applications in exploring novel materials, such as high-entropy alloys (HEAs).
Here we introduce a concept of “context similarity” for selecting chemical elements for HEAs, based on NLP models that analyze the abstracts of 6.4 million papers.
The method captures the similarity of chemical elements in the context used by scientists. It overcomes the limitations of NLP and identifies the Cantor and Senkov HEAs.
We demonstrate its screening capability for six- and seven-component lightweight HEAs by finding nearly 500 promising alloys out of 2.6 million candidates. The method thus brings an approach to the development of ultrahigh-entropy alloys and multicomponent materials.
Zongrui Pei
Senior Staff at New York University
Dr. Pei is a senior staff at the supercomputing center of New York University, focusing on various computational materials topics, particularly those related to first-principles calculations and machine learning (include natural language processing).