Chemistry in Python

Introduction

The other day, I analyzed the data with Python for the first time in a while. I forgot how to use pandas.

At that time, I suddenly thought about it. "By the way, I haven't heard much about people who have used Python in sync with our department."

I've heard some R users here and there ... (It's easy, and there are many packages, so it's natural.)

I will introduce that Python can also do chemistry.

RDkit It is a package that is very active in the field of Cheminformatics. Draw compounds and read structural data to see graph structures. What you can do with RDkit → https://www.rdkit.org

drawing

First, let's insert the structure appropriately.

from rdkit import Chem
from rdkit import IPythonConsole #Structure drawing settings
from rdkit.Chem import Draw

Here we use the notation SMILES to enter the structure. SMILES is an abbreviation for Simplified Molecular Input Line Entry System, which is a notation for converting structural information into character strings. See here for details (http://opensmiles.org)

For example, would you like to draw acetic acid?

m1=Chem.MolFromSmiles("CC(=O)O") #Method to read the character string written with smile
m1

Then, the structural formula of acetic acid can be drawn, which is easy.

![sakusan.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223549/f98e3e56-1ec6-b0d1-f923-94e04656bcf5.png)

Similarly, aromatic compounds such as phenolphthalein can be drawn.

m2=Chem.MolFromSmiles("O=C1OC(c2ccccc12)(c3ccc(O)cc3)c4ccc(O)cc4")
m2
![phef.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223549/6c6bb959-d51c-93d8-4fb0-a4c737fca0f6.png)

Get from the data

For example, when you want to create a dataset and analyze it. If you introduce it as a database, ●ChEMBL(https://www.ebi.ac.uk/chembl/) ●PubChem(https://pubchem.ncbi.nlm.nih.gov) I think this area is interesting.

I like the NIH site, so this time I'll use PubChem. If you search for "aspirin", which is famous for its antipyretic effect, the search screen below will appear. search.png

Save the 2D sructure sdf file from the Download button. sdf.png

d1=Chem.SDMolSupplier("Structure2D_CID_2244.sdf")#Read sdf file
Draw.MolsToGridImage(d1)
![aspirin.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223549/df817553-1d91-15cd-5404-c5a43abcb131.png) Aspirin has been summoned properly.

Structure search

I am not satisfied with just outputting, and I will play with the output structure in various ways.

search

See where the partial structures of the two chemicals match.

#Structural consistency between acetic acid and formic acid
c1 = Chem.MolFromSmiles('CC(=O)O') #Acetic acid
c2 = Chem.MolFromSmarts('O=CO') #Formic acid

c1.GetSubstructMatches(c2) #Checking the part where the structure matches with acetic acid and formic acid
((2,1,3)) This number is the atom index for formic acid and the corresponding acetic acid.
Delete
from rdkit.Chem import AllChem #Module for making molecular modifications

delete = AllChem.DeleteSubstructs(c1,c2)#Removed intersection with formic acid from acetic acid
Chem.MolToSmiles(delete)
Only'C'remained
Replacement
add = Chem.MolFromSmiles('CC(O)C(=O)O') #Substructure to replace(Lactic acid)
exc = AllChem.ReplaceSubstructs(c1,c2,add) #Replace the intersection of acetic acid and formic acid with lactic acid
Chem.MolToSmiles(exc[0])
I don't know if it exists, but it is possible to synthesize unknown compounds such as'C.CC (O) C (= O) O'.

Similarity index

There is a method called Fingerprint that describes the chemical features needed to show molecular similarity.

#Similarity calculation using fingerprints of acetic acid, formic acid and lactic acid

from rdkit import DataStructs
from rdkit.Chem.Fingerprints import FingerprintMols

data=[Chem.MolFromSmiles('CC(=O)O'),Chem.MolFromSmiles('O=CO'),Chem.MolFromSmiles('CC(O)C(=O)O')]#Insert 3 molecules for which you want to check the similarity

judge = [FingerprintMols.FingerprintMol(x) for x in data]
DataStructs.FingerprintSimilarity(judge[1],judge[2])#Similarity between formic acid and lactic acid
It was about 0.19. It doesn't look very similar. Tanimoto and Dice are famous as indicators for measuring similarity. In DataStructs.FingerprintSimilarity, the default is Tanimoto coefficient, so if you want to use another index
DataStructs.FingerprintSimilarity(fps[1],fps[2], metric=DataStructs.DiceSimilarity)#Change of similarity index
0.3125 This is an indicator by Dice.
Similarity map

Draw this similarity on the map. For example, draw a similarity map of benzoic acid-based lactose.

from rdkit.Chem.Draw import SimilarityMaps #Similarity map module

mol = Chem.MolFromSmiles('O=C(O)c1ccccc1')#benzoic acid
refmol=Chem.MolFromSmiles('O[C@H]1[C@H](O)[C@H](OC(O)[C@@H]1O)CO[C@@H]2O[C@H](CO)[C@H](O)[C@H](O)[C@H]2O')#Lactose

fp = SimilarityMaps.GetTTFingerprint(refmol, fpType='normal')#topological torsions finger print
fig, maxweight = SimilarityMaps.GetSimilarityMapForFingerprint(mol, refmol, SimilarityMaps.GetMorganFingerprint, metric=DataStructs.TanimotoSimilarity)#Mapping by Tanimoto coefficient
![類似度マップ.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223549/a024e3e9-2f0c-a09b-b576-37946d62abb9.png)
Visualization by electric charge

Now let's visualize the charge of benzoic acid.

AllChem.ComputeGasteigerCharges(mol)
contribs = [float(mol.GetAtomWithIdx(i).GetProp('_GasteigerCharge')) for i in range(mol.GetNumAtoms())] #Partial charge visualization using the Gasteiger method
fig = SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, colorMap='jet', contourLines=10)
I was able to visualize the charge action with a color map. ![Unknown.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223549/98305d01-a1a1-dc55-79d1-b2df5da7c547.png)

in conclusion

RDkit is convenient because it can perform everything from drawing to structural analysis and editing. I would like to try again if there is an interesting package in Python.

Recommended Posts