Let's feel like a material researcher with python [Introduction to pymatgen]

Hello.

In the previous article, I introduced material research on the theme of machine learning. Let's feel like a material researcher with machine learning

This time, I will introduce pymatgen </ b> (Python Materials Genomics), which is an open source python library, for my own study. I would like you to come into contact with material research.

Assumed reader

--I'm a programmer. People who are a little interested in physics and materials ――People who are doing materials but want to make their research more efficient ――I'm doing materials. People who want to bite into the program -materials informatics Interested people

As an aside, I created a Facebook group called Materials Informatics Young People's Association. There are only a few people yet, but if you are interested, please feel free to contact me (fb or email posted on your Qiita account). I would like to share information about materials informatics and hold study sessions.

then, let's begin.

table of contents

-[What is material research in the first place? (Quotation from the above)](#What is material research in the first place) -[What is pymatgen](What is #pymatgen) -[Basic function](#Basic function) -[Linkage with other tools](#Linkage with other tools) -[Install pymatgen](Install #pymatgen) -[Let's actually move](# Let's actually analyze)

If you want to feel like a materials researcher right away, you can jump to the installation. The basics are based on the pymatgen official document, so if you don't have a hard time reading English, you can understand it as if you read this. then, let's begin.

What is material research in the first place?

This is a quote from the above. If you read it, please skip it. </ b> → [What is pymatgen](What is #pymatgen)

First of all, I think that material research is something like what kind of material it is. As you said, there are various materials such as ceramics, polymers, and metals.

For example iPhone iphone.jpg

There are hundreds of such small ceramic capacitors in it.

0206_img0002.jpg

And in order to make a high-performance capacitor that is chosen by Apple,

・ What kind of elements should be combined? ・ What kind of process should I make?

It will solve the difficult problem. Here is an example of how to solve it.

  1. For the time being, do you want to change the one made by the great men of the past?
  2. Unexpectedly ... I wonder if it is better to increase the mixing time (process optimization)
  3. It's done! Measure and observe various things (measurement)
  4. I see, it's a high-performance physical phenomenon like this (theory, analysis)
  5. The theory and analysis revealed the tendency, so next time I would like to make it with such elements ... (search)

It's like that. This is just an example, but what is material research?

Make great materials by making full use of process optimization, measurement, theory, analysis, exploration, etc.

It's messy, but it looks like this.

What is pymatgen

Simply put, it's a useful tool for analyzing materials.

First, take a look at the diagram below to get a rough idea of material analysis. スクリーンショット 2017-05-17 午後0.19.07.png From Wikipedia

As I think I did in high school, matter is composed of atoms, and each has a wide variety of structures. The purpose of material analysis is What kind of atom has what kind of structure and what kind of property? </ b> The solution is to use physics.

So the main subject is pymatgen,

(1) Many useful tools for visualizing materials and using analysis data ② Easy to link with the tools currently used for material analysis

It is a open source </ b> python library that has the advantage of. Please note that it is not an analysis software that runs on a GUI.

As a specific function

  1. Flexible expression of the types, positions, and structures of elements contained in materials with python classes
  2. Supports various data formats often used in material research
  3. Abundant analysis tools useful for outputting phase diagrams and potential-pH diagrams, and analyzing chemical reactions and diffusion
  4. Electronic structure analysis such as band structure and density of states is possible
  5. Can be linked with Materials Project REST API

Is listed in the official documentation.

From here, let's feel like a material researcher by actually using pymatgen.

Basic function

Representation of atoms and structures by class

Reference: http://pymatgen.org/pymatgen.core.html#module-pymatgen.core First, let's look at the modules provided by pymatgen for expressing atoms and structures.

pymatgen.core.periodic_table module The periodic table is the periodic table </ b> that you all know. Introducing the Element class, Specie class </ b> of this module. In Element class, you can inherit the Enum class and define atoms corresponding to the periodic table.

class Element(Enum):
    def __init__(self, symbol):
        self.symbol = "%s" % symbol
        d = _pt_data[symbol]
        ...

Since periodic_table.json in the pymatgen library is loaded into _pt_data when the module is imported,

>>> fe = Element("Fe")
>>> fe.data
{'Superconduction temperature': 'no data K', 'Molar volume': '7.09 cm<sup>3</sup>', 'Ionic radii hs': {'2': 0.92, '3': 0.785}, 'Melting point': '1811 K', 'Atomic radius': 1.4, 'Mineral hardness': '4.0', 'Electrical resistivity': '10 10<sup>-8</sup> &Omega; m', 'Vickers hardness': '608 MN m<sup>-2</sup>', 'Brinell hardness': '490 MN m<sup>-2</sup>', 'Youngs modulus': '211 GPa', 'Ionic radii': {'2': 0.92, '3': 0.785}, 'Atomic no': 26, 'Mendeleev no': 61, 'Thermal conductivity': '80 W m<sup>-1</sup> K<sup>-1</sup>', 'Reflectivity': '65 %', 'Liquid range': '1323 K', 'Ionic radii ls': {'2': 0.75, '6': 0.39, '3': 0.69, '4': 0.725}, 'Rigidity modulus': '82 GPa', 'X': 1.83, 'Critical temperature': 'no data K', 'Poissons ratio': '0.29', 'Oxidation states': [-2, -1, 1, 2, 3, 4, 5, 6], 'Van der waals radius': 'no data', 'Velocity of sound': '4910 m s<sup>-1</sup>', 'Coefficient of linear thermal expansion': '11.8 x10<sup>-6</sup>K<sup>-1</sup>', 'Bulk modulus': '170 GPa', 'Common oxidation states': [2, 3], 'Name': 'Iron', 'Atomic mass': 55.845, 'Electronic structure': '[Ar].3d<sup>6</sup>.4s<sup>2</sup>', 'Density of solid': '7874 kg m<sup>-3</sup>', 'Refractive index': 'no data', 'Atomic radius calculated': 1.56, 'Boiling point': '3134 K'}

You can easily create an object with various information such as the ionic radius, melting point, resistivity, mass, and electronic structure of the atom. If you want to access each, specify the attribute as follows and get it. You can see the attribute list by looking at the source or document.

ionic_radii_fe = fe.ionic_radii

Next, let's take a look at the Specie class. In the Specie class, atoms can be represented by considering the oxidation number.

supported_properties = ("spin",)

class Specie(symbol, oxidation_state, properties=None):
    def __init__(self, symbol, oxidation_state, properties=None):
        self._el = Element(symbol)
        self._oxi_state = oxidation_state
        self._properties = properties if properties else {}
        for k in self._properties.keys():
            if k not in Specie.supported_properties:
                raise ValueError("{} is not a supported property".format(k))

Elements can have oxidation numbers and properties. You can simply think of it as an extended version of Element. It is recommended that the Specie object have ideal oxidation numbers and characteristics, and the Site object described later can express the oxidation state and spin state of elements in the crystal structure, so the simulation results are saved. To do this, use the Site object.

pymatgen.core.composition http://pymatgen.org/pymatgen.core.composition.html#module-pymatgen.core.composition This module is a module that expresses the composition of substances such as H2O and NaCl </ b>. Here are the most commonly used Composition class classes.

class Composition(collections.Hashable, collections.Mapping, MSONable):
    def __init__(self, *args, **kwargs):
        self.allow_negative = kwargs.pop('allow_negative', False)
        # it's much faster to recognize a composition and use the elmap than
        # to pass the composition to dict()
        if len(args) == 1 and isinstance(args[0], Composition):
            elmap = args[0]
        elif len(args) == 1 and isinstance(args[0], six.string_types):
            elmap = self._parse_formula(args[0])
        else:
            elmap = dict(*args, **kwargs)
        elamt = {}
        self._natoms = 0
        for k, v in elmap.items():
        ...

It's hard to understand even at a glance and it's difficult to explain, so let's just see how to use it. .. ..

>>> #Easy to define with strings like NaCl and H2O
>>> comp = Composition("LiFePO4")
>>> #Atomic number count
>>> comp.num_atoms
7.0
>>> #Number of each atom
>>> comp.formula
'Li1 Fe1 P1 O4'
>>> #Composition ratio(Atomic number/全Atomic number)
>>> comp.get_atomic_fraction(Element("Li"))
0.14285714285714285

It is easy to define, and you can create a convenient composition object. There are many other features, so please refer to the Documentation.

pymatgen.core.lattice

http://pymatgen.org/pymatgen.core.lattice.html#module-pymatgen.core.lattice Lattice means a lattice, and I think many people remember the unit lattice that they learn even in high school. Screenshot 2017-05-18 PM 6.17.54.png From wikipedia

Vector defined here

R = {n}_1{a}_1+{n}_2{a}_2+{n}_3{a}_3

Represents the lattice vector. This 3D vector is defined by the following multidimensional array.

R = [[10,0,0], [20,10,0], [0,0,30]]

In pymatgen, using the lattice class makes it even more convenient.

class Lattice(MSONable):
    
    def __init__(self, matrix):
        m = np.array(matrix, dtype=np.float64).reshape((3, 3))
        lengths = np.sqrt(np.sum(m ** 2, axis=1))
        angles = np.zeros(3)

        for i in range(3):
            j = (i + 1) % 3
            k = (i + 2) % 3
            angles[i] = abs_cap(dot(m[j], m[k]) / (lengths[j] * lengths[k]))

        self._angles = np.arccos(angles) * 180. / pi
        self._lengths = lengths
        self._matrix = m
        self._inv_matrix = None
        self._metric_tensor = None
        self._diags = None
        self._lll_matrix_mappings = {}
        self._lll_inverse = None
        self.is_orthogonal = all([abs(a - 90) < 1e-5 for a in self._angles])
        ...

The argument matrix corresponds to the following format or numpy array.

#Multidimensional list
[[1, 0, 0], [0, 1, 0], [0, 0, 1]]
#list
[1, 0, 0 , 0, 1, 0, 0, 0, 1]
#Tuple
(1, 0, 0, 0, 1, 0, 0, 0, 1)

Above is a simple cubic crystal (cube).

>>> l = Lattice([1,0,0,0,1,0,0,0,1])
>>> l._angles
array([ 90.,  90.,  90.])
>>> l.is_orthogonal
True
>>> l._lengths
array([ 1.,  1.,  1.])
>>> l._matrix
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In this way, you can use the Lattice object to access angles, lengths, and so on. We recommend reading the documentation as it has other useful features. http://pymatgen.org/pymatgen.core.lattice.html

pymatgen.core.structure http://pymatgen.org/pymatgen.core.structure.html This module provides features that enable crystal structure representation </ b>. Here, we will look at the IStructure class, which provides the most basic functions.

class IStructure(SiteCollection, MSONable):

    def __init__(self, lattice, species, coords, validate_proximity=False,
                 to_unit_cell=False, coords_are_cartesian=False,
                 site_properties=None):
    ...

It takes various arguments, so let's take a look at each one.

  • lattice

You can also use the pymatgen.core.lattice.Lattice class.

  • species

This is the type of atom. It supports various formats as follows.

#List of atoms
["Li", "Fe2+", "P", ...]
#Atomic number
(3, 56, ...)
#List including occupancy
[{"Fe" : 0.5, "Mn":0.5}, ...] 
  • coords

It specifies the coordinates of each atom.

#At the time of NaCl
coords = [[0, 0, 0], [0.5, 0.5, 0.5]]

With these in mind, Structure can be defined like this:

from pymatgen import Lattice, IStructure
#CsCl structure
a = 4.209 #Å
latt = Lattice.cubic(a)
structure = IStructure(latt, ["Cs", "Cl"], [[0, 0, 0], [0.5, 0.5, 0.5]])
>>> structure.density
3.7492744897576538
>>> structure.distance_matrix
array([[ 0.        ,  3.64510092],
       [ 3.64510092,  0.        ]])
>>> structure.get_distance
<bound method IStructure.get_distance of Structure Summary
Lattice
    abc : 4.2089999999999996 4.2089999999999996 4.2089999999999996
 angles : 90.0 90.0 90.0
 volume : 74.565301328999979
      A : 4.2089999999999996 0.0 0.0
      B : 0.0 4.2089999999999996 0.0
      C : 0.0 0.0 4.2089999999999996
PeriodicSite: Cs (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]
PeriodicSite: Cl (2.1045, 2.1045, 2.1045) [0.5000, 0.5000, 0.5000]>

In this way, you will be able to access distances, positional relationships, densities, etc. in the structure. Next, let's see what kind of analysis function is available.

List of modules for analysis

I will briefly introduce what kind of modules are available. I will pick up some of them and try to actually move them. The right side of this image is the analysis function provided by pymatgen.

スクリーンショット 2017-05-18 午後6.56.09.png

--Phase diagram output --Reaction calculation --Electronic structure analysis and visualization --Application characteristic analysis such as battery characteristics --Structural visualization

etc. .. ..

For example, in the case of electronic structure analysis, band structure analysis can be performed. スクリーンショット 2017-05-18 午後6.48.53.png

You can also output a phase diagram. スクリーンショット 2017-05-18 午後6.52.02.png

I will write articles that introduce the functions in detail from the source if there is demand & if I feel like it, so please comment if you have any requests.

Cooperation with other tools

Basically, other software is used for simulations with a large amount of calculation, but efficiency will increase if pymatgen is used for analysis and visualization that combine these data. On the left side of this image is the link with commonly used data formats and tools. スクリーンショット 2017-05-18 午後6.56.09.png

--VASP input and output can be imported --Cif files used in Material Studio etc. can also be handled --Supports open babel format --Can be linked with Materials Project rest api

And so on. If you are using the above files or software for first-principles calculations, please introduce it.

Regarding Materials Project rest api, I wrote how to use it in Previous article, so please refer to it. If you use this, the Materials Project will publish a large amount of data and you can freely collect data, so it is essential if you want to do machine learning.

It's been a long time, but let's all use pymatgen!

Install pymatgen

The basic flow of installation is

  1. Environment where python can be used (3 series is recommended)
  2. Install conda
  3. Install pymatgen with conda

is. Please refer to the following articles until installing conda. Python environment construction for those who aim to become data scientists 2016

When you're done that far

conda install --channel matsci pymatgen

After checking the operation, the installation is complete.

>>> import pymatgen
>>> pymatgen.__version__
'4.5.4'

Now that you're ready, let's try it out!

Let's actually move it

Band structure analysis

The band structure represents the dispersion of electrons in the periodic structure of a crystal. When illustrating the band structure, the vertical axis is energy, but the horizontal axis is the points of the reciprocal lattice space, which is quite difficult to understand. So, if you are not interested, please understand to the extent that you can understand how the electrons are dispersed.

With pymatgen, you can visualize the analyzed band structure and process it as data. First, import the required libraries.

#module for using REST API of materials project
from pymatgen.matproj.rest import MPRester
#For plotting band structures
%matplotlib inline
from pymatgen.electronic_structure.plotter import BSPlotter

Next, get the analyzed band structure. In the real research, the file analyzed by your own analysis software (VASP etc.) is converted to the object of pymatgen, but this time, the analyzed object is downloaded from Materials Project database </ b>. .. To use the materials project, you need to register and get the API key, so please get the API key on the Official Page.

#Specify your API key
a = MPRester("My API key")
#Specify the id of the desired material and get it by http communication. CuAlO2(mp-Get 3784)
bs = a.get_bandstructure_by_material_id("mp-3748")

This completes the acquisition of the band structure! I got a Band Structure object directly from the materials project. http://pymatgen.org/_modules/pymatgen/electronic_structure/bandstructure.html#BandStructure

By the way, you can get the information of materials like this on the materials project. スクリーンショット 2017-05-19 午後10.35.28.png

We will process this information on python.

>>> bs.is_metal()
False

It doesn't look like metal.

>>> bs.get_band_gap()
{'energy': 1.7978000000000005, 'direct': False, 'transition': '(0.591,0.409,0.000)-\\Gamma'}

Bandgap is 1.977eV

>>> bs.get_direct_band_gap()
18.0201

The bandgap for direct transition is 18.021eV. Now let's plot the band diagram.

%matplotlib inline
from pymatgen.electronic_structure.plotter import BSPlotter
plotter = BSPlotter(bs)
plotter.get_plot().show()

スクリーンショット 2017-05-19 午後10.42.15.png

This is the data on the materials project, but you can output the simulation results of the substances you are usually researching.

So, I'd be happy if I could gradually understand that pymatgen objects seem to be useful for comparison with other materials, machine learning, exploration, etc.

There are still many things we can do, but this time we will end here. (It was long···) I would like to hold a pymatgen study session, a materials project study session, a vasp study session, or Materials Informatics Young People's Association, so if you are interested, please come and join us. Please come in ~

Also, I will write an article in Materials Informatics next time. Thank you for your careful reading ~

Recommended Posts