[PYTHON] Smartly embed atomic properties in SDF files

Introduction

If it is necessary to retain the properties of each atom instead of the entire molecule, it is not possible in the SMILES format, so it is embedded in the SDF and stored. However, since SDF basically defines records in molecular units, embedding atomic properties has to be a brute force method, such as including the atomic index in the property name.

I searched variously to see if there was a good way to store the properties of atoms, and found that the functions of RDKit described in https://www.rdkit.org/docs/RDKit_Book.html#atom-properties-and-sdf-files. I found that it can be done smoothly by using, so make a note of it.

environment

What do you mean

SDMolSupplier since RDKit 2019.03.1 recognizes some molecular properties as an atomic property list and can convert them to atomic properties.

That is, properties whose names start with atom.prop, atom.iprop, atom.dprop, or atom.bprop are converted to string, int, double, or bool atomic properties, respectively.

Let's try

The details are as per the URL, but seeing is believing. Let's try it.

The following are the "PartialSigmaCharge", "PiElectronegativity", "AtomHybridizationVSEPR", "AtomDegree", "AtomHybridization", "EffectiveAtomPolarizability", "SigmaElectronegativity", "AtomValance", "StabilizationPlusCharge" and their atomic values calculated by CDK. It is a python program that embeds in SDF as an atomic property list, saves and loads it, and loads it as an atomic property with the GetDoubleProp method of the Atom class (CDK program is omitted). There were also Double type and int type properties this time, but since it was troublesome, I saved them all as Dobule type (atom.dprop).

from rdkit import Chem
from rdkit.Chem import rdmolfiles
from rdkit.Chem import rdmolops
from rdkit.Chem.rdchem import Mol

def execute_javacdk_atom_feature(mol, atom_properties):

    mol_file = "tmp.mol"
    with open(mol_file, "w") as f:
        txt = Chem.MolToMolBlock(mol)
        f.writelines(txt)
    
    #Run the CDK program
    import subprocess
    cmd = "java -classpath lib\cdk-2.2.jar;lib AtomicCircleDescriptorCalculator {0} -1".format(mol_file)

    proc = subprocess.run(cmd, shell=True)
    if proc.returncode != 0:
        print("error")
        return

    import pandas as pd
    df = pd.read_csv("result.csv")
    df = df[atom_properties]

    #While reading the CDK result, set the atom property to each atom with the SetDoubProp function.
    for row in df.iterrows():
        atom = mol.GetAtomWithIdx(row[0])
        for atom_property in atom_properties:
            atom.SetDoubleProp(atom_property, row[1][atom_property])


    #Save atomic property list as molecular property
    for atom_property in atom_properties:
        Chem.CreateAtomDoublePropertyList(mol, atom_property)

    return mol


def main():

    import argparse
    from rdkit import Chem

    #Atomic properties you want to fit in SDF
    atom_properties = ["longestMaxTopDistInMolecule", "highestMaxTopDistInMatrixRow", "diffSPAN3", "relSPAN4", "PartialSigmaCharge", "PiElectronegativity",
    "AtomHybridizationVSEPR", "AtomDegree", "AtomHybridization", "EffectiveAtomPolarizability", "SigmaElectronegativity", "AtomValance",
    "StabilizationPlusCharge"]

    parser = argparse.ArgumentParser()
    parser.add_argument("-input", type=str, required=True)
    parser.add_argument("-output", type=str, required=True)

    args = parser.parse_args()

    #For reading sdf
    sdf_sup = Chem.SDMolSupplier(args.input)
    #For writing sdf
    sdf_wtr = Chem.SDWriter(args.output)

    #Embed the calculation result of CDK in the atom in the molecule while reading sdf and output
    for i, mol in enumerate(sdf_sup):
        #Embed the calculation result of CDK in the atom in the molecule
        execute_javacdk_atom_feature(mol, atom_properties)

        #Output molecules to SDF file
        sdf_wtr.write(mol)

    sdf_wtr.close()

    #Read the output SDF file and try to output the atomic properties
    sdf_sup_output = Chem.SDMolSupplier(args.output)
    for i, mol in enumerate(sdf_sup_output):
        print(mol.GetProp("_Name"))
        for atom in mol.GetAtoms():
            print(atom.GetIdx())
            for atom_property in atom_properties:
                print("{0}:{1}".format(atom_property, atom.GetDoubleProp(atom_property)))


if __name__ == "__main__":
    main()

Execution result

Atomic properties can be read like this.

13_cis_retinoic_acid
0
longestMaxTopDistInMolecule:13.0
highestMaxTopDistInMatrixRow:13.0
diffSPAN3:0.0
relSPAN4:1.0
PartialSigmaCharge:-0.14444603089353975
PiElectronegativity:0.0
AtomHybridizationVSEPR:3.0
AtomDegree:1.0
AtomHybridization:3.0
EffectiveAtomPolarizability:4.399859863281246
SigmaElectronegativity:12.344630439100552
AtomValance:6.0
StabilizationPlusCharge:0.0
1
longestMaxTopDistInMolecule:13.0
highestMaxTopDistInMatrixRow:13.0
diffSPAN3:0.0
relSPAN4:1.0
PartialSigmaCharge:-0.2572541465287616
PiElectronegativity:8.434574556331611
AtomHybridizationVSEPR:2.0
AtomDegree:1.0
AtomHybridization:2.0
EffectiveAtomPolarizability:4.007609863281251
SigmaElectronegativity:13.562163443445453
AtomValance:6.0
StabilizationPlusCharge:0.0

How it is stored in SDF

How it is saved in SDF is like this. The values for all atoms are stored in the property separated by spaces.

13_cis_retinoic_acid
     RDKit          3D

 22 22  0  0  1  0  0  0  0  0999 V2000
   10.6160    5.4820   -0.2300 O   0  0  0  0  0  0  0  0  0  0  0  0
    9.9350    3.3080   -0.3420 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.1650   -3.7930   -0.4060 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9360   -5.2870   -0.2160 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.1970   -6.0950   -0.4140 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4270   -3.2930    0.2730 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2930   -5.6270    0.5190 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3720   -4.1470    0.7160 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.9180   -3.1060    0.1200 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2620   -3.5100   -1.8950 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4900   -1.8530    0.4430 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5940   -3.7230    1.4610 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5660   -1.0700    0.3160 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5800    0.3580    0.4930 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.3000    1.0310    0.8560 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7270    1.0260    0.3240 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8760    2.4440    0.4680 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.0420    3.0690    0.2850 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.2550    4.4870    0.4140 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.0830    5.3300    0.7850 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.4270    5.0940    0.2220 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.7040    4.4870   -0.1440 C   0  0  0  0  0  0  0  0  0  0  0  0
  1 22  1  0
  2 22  2  0
  3  4  1  0
  3  6  1  0
  3  9  1  0
  3 10  1  0
  4  5  1  0
  5  7  1  0
  6  8  2  0
  6 11  1  0
  7  8  1  0
  8 12  1  0
 11 13  2  0
 13 14  1  0
 14 15  1  0
 14 16  2  0
 16 17  1  0
 17 18  2  0
 18 19  1  0
 19 20  1  0
 19 21  2  0
 21 22  1  0
M  END
>  <ID>  (1) 
13_cis_retinoic_acid

>  <atom.dprop.longestMaxTopDistInMolecule>  (1) 
13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

>  <atom.dprop.highestMaxTopDistInMatrixRow>  (1) 
13 13 11 12 13 10 12 11 12 12 9 12 8 7 8 7 8 9 10 11 11 12

>  <atom.dprop.diffSPAN3>  (1) 
0 0 2 1 0 3 1 2 1 1 4 1 5 6 5 6 5 4 3 2 2 1

>  <atom.dprop.relSPAN4>  (1) 
1 1 0.84615384615384615 0.92307692307692324 1 0.76923076923076927 0.92307692307692324 0.84615384615384615 0.92307692307692324 0.92307692307692324 0.69230769230769229 0.92307692307692324
0.61538461538461542 0.53846153846153844 0.61538461538461542 0.53846153846153844 0.61538461538461542 0.69230769230769229 0.76923076923076927 0.84615384615384615 0.84615384615384615
0.92307692307692324

>  <atom.dprop.PartialSigmaCharge>  (1) 
-0.14444603089353975 -0.25725414652876161 0.018323058588212145 0.0044130542545423954 0.0047171590351277118 -0.030785271462194001 0.021723335595041502 -0.047931809716993255
0.0044017486353659045 0.0044017486353659045 -0.0053179497186989309 0.026077408846979573 -0.0046644578993843356 -0.021862432534977119 0.031157982917592638 -0.0043178495737188505
-0.00066866837339847867 -0.0041286011669637492 -0.016275053354415626 0.031355763573928032 0.067212842574592244 0.32386816856629769

>  <atom.dprop.PiElectronegativity>  (1) 
0 8.4345745563316115 0 0 0 5.7377113992064643 0 5.6043578182067719 0 0 5.9378835230737863 0 5.943053057762822 5.807558163952943 0 5.9457956219007961 5.9746983272066947 5.9472932607203139
5.8514526889811611 0 6.5217619200580135 8.7517623225150878

>  <atom.dprop.AtomHybridizationVSEPR>  (1) 
3 2 3 3 3 2 3 2 3 3 2 3 2 2 3 2 2 2 2 3 2 2

>  <atom.dprop.AtomDegree>  (1) 
1 1 4 2 2 3 2 3 1 1 2 1 2 3 1 2 2 2 3 1 2 3

>  <atom.dprop.AtomHybridization>  (1) 
3 2 3 3 3 2 3 2 3 3 2 3 2 2 3 2 2 2 2 3 2 2

>  <atom.dprop.EffectiveAtomPolarizability>  (1) 
4.3998598632812458 4.0076098632812514 11.022853027343745 8.7883015136718754 7.9534945068359395 11.401456054687504 8.4101140136718744 9.9484780273437483 7.4461765136718752
7.4461765136718752 9.9611621093749996 7.0679890136718742 9.4839492187500003 9.7312109375000002 6.959355468750001 8.9463281250000009 8.6171093749999947 8.5909453125000006 8.8427539062500067
6.515126953124998 7.6181894531249963 6.1732197265625013

>  <atom.dprop.SigmaElectronegativity>  (1) 
12.344630439100552 13.562163443445453 8.1489517017565554 8.0200518228093163 8.0227611391275193 8.5055780121018785 8.1802713269458405 8.3472735246209044 8.0199577205445411
8.0199577205445411 8.7413311783943062 8.2201852554190094 8.7471753452915539 8.586988358849446 8.2666333630758384 8.7502658294135927 8.7840132211143533 8.7518543450176072 8.6368154001064532
8.268288364628825 9.416522812299613 11.9692325165343

>  <atom.dprop.AtomValance>  (1) 
6 6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

>  <atom.dprop.StabilizationPlusCharge>  (1) 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The elegant.

in conclusion

When doing Graph Convoluation with this, I wanted to embed the atomic property defined by myself in the node attribute, but it seems that it can be implemented quite smartly by using this method.

Recommended Posts

Smartly embed atomic properties in SDF files
Embed wav files in Jupyter