Python hand play (calculated full of mordred)

What is this article?

A piece of chemoinformatics article. It is said that he used Mordred (1.0.0) of the descriptor calculation library and finally found the inf as the calculation result.

background

Recently, I've been dealing with descriptor calculation results for a while. So, there seems to be a result of "inf". But I have never seen it. I searched for about 1000 lightly, but I couldn't find it. Then, let's open it up again and calculate tens of thousands of cases to find it.

policy

Mordred calculation normally. So, add except every time an error occurs. I felt like it would continue to operate for a long time.

environment

The following environment created from Anaconda on Windows 10 Pro (x64).

# Name                    Version                   Build  Channel
python                    3.6.8                h9f7ef89_7
rdkit                     2017.09.2.0      py36he334aed_1    rdkit
mordred                   1.0.0                    py36_0    mordred-descriptor

I don't think it will affect anything other than the above.

result

This is the code.

from rdkit import Chem
from mordred import Calculator, descriptors
from mordred import error as err
from datetime import datetime

descs = Calculator(descriptors, ignore_3D=False).descriptors

# ------------------------------------------------------
#  functions
# ------------------------------------------------------


# get compounds
def get_mols(file):
    return Chem.SDMolSupplier(file)


# write text
def output_text(filename, mode, values):
    with open(filename + '.csv', mode) as f:
        f.write(','.join(values) + '\n')


# calculation
def calculate_desc(calc, mol):
    value = None
    try:
        value = calc(mol)
    except ZeroDivisionError as e:
        value = 'errZero'
    except IndexError as e:
        value = 'errIndex'
    except ValueError as e:
        value = 'errValue'
    except NameError as e:
        value = 'errNone'
    except err.Missing3DCoordinate as e:
        value = 'err3D'
    except err.MultipleFragments as e:
        value = 'errMulti'
    return str(value)


# print log
def printlog(value):
    print(str(datetime.now()) + ',' + str(i))

# ------------------------------------------------------
#  main
# ------------------------------------------------------


# get compounds
filename = 'CHEMBL503873'
mols = get_mols(filename + '.sdf')

# get calculators
headers = list()
calcs = list()
headers.append('Name')
for i in range(1824):
    calcs.append(descs[i])
    headers.append(calcs[i].__str__())

# output
output_text(filename, 'w', headers)
printlog(0)
for i, mol in enumerate(mols):
    values = list()
    if mol is not None:
        values.append(mol.GetProp('_Name'))
        for calc in calcs:
            values.append(calculate_desc(calc, mol))
        output_text(filename, 'a', values)
        if i % 100 == 0:
            printlog(i)

So, I embedded it in the script, but this is the compound I found.

https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL503873/

CHEMBL503873 C70H108O24

CO[C@H]1C[C@H](COC[C@@H]2[C@@H](C)O[C@H](C[C@@H]2OC)O[C@H]3CC[C@]4(C)[C@H]5C[C@@H](OC(=O)\C=C\c6ccccc6)[C@]7(C)[C@@](O)(CC[C@]7(O)[C@]5(O)CC=C4C3)C(=O)C)O[C@@H](C)[C@H]1COC[C@H]8C[C@H](OC)[C@H](COC[C@H]9C[C@@H](OC)[C@@H](O[C@H]%10O[C@@H](CO)[C@H](O)[C@@H](O)[C@@H]%10O)[C@H](C)O9)[C@@H](C)O8

So, this is a confirmation just in case.

■ Python hand play (calculate descriptors in units of mordred) https://qiita.com/siinai/items/026aad1f05c9f6d51199

(py36) D:\py>python 71-01.py
GRAVH
-------------------------------------

inf

Yup. You certainly found inf.

Impressions

I posted a slightly serious article for the first time in a long time. But after all it is interesting to move your hands. However, it just takes time. Even though I'm using a CPU of 4 Cores / 8 Threads, the CPU load factor is 30%. Oh, that's right. Well, it would be nice to seriously assemble multi-threads and divide them by computer or compound. I want to try it later.

Postscript

For the time being, I will paste the calculation results.

Well ... I feel like I was able to do my best with 100% CPU load for more than a day just with this calculation ... I don't think so ... It's hard. You want a GPU, but you also want a CPU. That's 16 cores.

CHEMBL10786 CHEMBL263256 CHEMBL503873 CHEMBL501567 CHEMBL500702 CHEMBL501093 CHEMBL501094 CHEMBL505931 CHEMBL444732 CHEMBL444155 CHEMBL445174 CHEMBL445253 CHEMBL444510 CHEMBL501306 CHEMBL502034 CHEMBL499522 CHEMBL500203 CHEMBL498862 CHEMBL503717 CHEMBL503722 CHEMBL504025 CHEMBL504038 CHEMBL502642 CHEMBL500358 CHEMBL500619 CHEMBL500622 CHEMBL500058 CHEMBL500182 CHEMBL500184 CHEMBL504187 CHEMBL525749 CHEMBL525930 CHEMBL526006 CHEMBL526343 CHEMBL526355 CHEMBL526373 CHEMBL499978 CHEMBL499980 CHEMBL500099 CHEMBL500244 CHEMBL508221 CHEMBL500219 CHEMBL500223 CHEMBL506996 CHEMBL507128 CHEMBL525750 CHEMBL503778 CHEMBL503489 CHEMBL503495 CHEMBL507216 CHEMBL502664 CHEMBL502666 CHEMBL503666 CHEMBL503894 CHEMBL525940 CHEMBL525945 CHEMBL526501 CHEMBL500441 CHEMBL500451 CHEMBL502457 CHEMBL525219 CHEMBL525221 CHEMBL527042 CHEMBL525450 CHEMBL526129 CHEMBL526130 CHEMBL508387 CHEMBL508391 CHEMBL498956 CHEMBL503974 CHEMBL503979 CHEMBL507601 CHEMBL504097 CHEMBL524833 CHEMBL525962 CHEMBL525424 CHEMBL525951 CHEMBL526360 CHEMBL525216 CHEMBL525217 CHEMBL509192 CHEMBL501147 CHEMBL501266 CHEMBL503261 CHEMBL526689 CHEMBL526690 CHEMBL498967 CHEMBL501641 CHEMBL500002 CHEMBL500011 CHEMBL524521 CHEMBL506061 CHEMBL504078 CHEMBL508019 CHEMBL500187 CHEMBL500103 CHEMBL445002 CHEMBL525762 CHEMBL525763 CHEMBL525398 CHEMBL525399 CHEMBL526113 CHEMBL526115 CHEMBL526119 CHEMBL526121 CHEMBL526181 CHEMBL502415 CHEMBL502420 CHEMBL502978 CHEMBL505143 CHEMBL501291 CHEMBL502603 CHEMBL503695 CHEMBL504000 CHEMBL504159 CHEMBL526190 CHEMBL526301 CHEMBL501788 CHEMBL506306 CHEMBL500524 CHEMBL499537 CHEMBL501823 CHEMBL504080 CHEMBL504417 CHEMBL507534 CHEMBL502988 CHEMBL500373 CHEMBL500375 CHEMBL505276 CHEMBL500264 CHEMBL526336 CHEMBL525083 CHEMBL525086 CHEMBL525089 CHEMBL503245 CHEMBL503306 CHEMBL501970 CHEMBL503617 CHEMBL503852 CHEMBL503858 CHEMBL502077 CHEMBL501569 CHEMBL504902 CHEMBL526516 CHEMBL526681 CHEMBL526682 CHEMBL525441 CHEMBL501317 CHEMBL501323 CHEMBL502678 CHEMBL503342 CHEMBL507824 CHEMBL499931 CHEMBL499957 CHEMBL500483 CHEMBL500788 CHEMBL525771 CHEMBL503047 CHEMBL503286 CHEMBL504214 CHEMBL504401 CHEMBL525073 CHEMBL525624 CHEMBL526743 CHEMBL526874 CHEMBL526876 CHEMBL524358 CHEMBL524487 CHEMBL524488 CHEMBL527050 CHEMBL524494 CHEMBL524498 CHEMBL525068 CHEMBL525069 CHEMBL525407 CHEMBL525409 CHEMBL527084 CHEMBL591794 CHEMBL592148 CHEMBL592149 CHEMBL1208990 CHEMBL524531 CHEMBL524539 CHEMBL593680 CHEMBL589995 CHEMBL589997 CHEMBL525394 CHEMBL526678 CHEMBL526890 CHEMBL525224 CHEMBL525386 CHEMBL526131 CHEMBL596000 CHEMBL526544 CHEMBL526545 CHEMBL527072 CHEMBL527074 CHEMBL525419 CHEMBL525991 CHEMBL530121 CHEMBL526741 CHEMBL595999 CHEMBL526703 CHEMBL526853 CHEMBL526916 CHEMBL526922 CHEMBL525076 CHEMBL524356 CHEMBL524357 CHEMBL525237 CHEMBL525242 CHEMBL525402 CHEMBL530345 CHEMBL605624 CHEMBL608706 CHEMBL605628 CHEMBL595776 CHEMBL591446 CHEMBL607837 CHEMBL1097890 CHEMBL589278 CHEMBL589762 CHEMBL602303 CHEMBL605828 CHEMBL609471 CHEMBL604989 CHEMBL608415 CHEMBL1097888 CHEMBL1213233 CHEMBL611968 CHEMBL1099238 CHEMBL132931 CHEMBL135376 CHEMBL136703 CHEMBL194552 CHEMBL207341 CHEMBL214100 CHEMBL216830

Recommended Posts

Python hand play (calculated full of mordred)
Python hand play (division)
Python hand play (one line notation of if)
Python hand play (two-dimensional list)
Full understanding of Python debugging
Python hand play (argparse minimum code)
Python hand play (Pandas / DataFrame beginning)
Python hand play (descriptor calculation: serious version)
Full understanding of Python threading and multiprocessing
Python hand play (let's get started with AtCoder?)
Introduction of Python
Play Python async
Play with 2016-Python
Basics of Python ①
Basics of python ①
Python hand play (interoperability between CSV and PostgreSQL)
Copy of python
Introduction of Python
Python asynchronous processing ~ Full understanding of async and await ~
Python hand play (get column names from CSV file)
[Python] Operation of enumerate
List of python modules
Unification of Python environment
Copy of python preferences
Basics of Python scraping basics
[python] behavior of argmax
Play youtube in python
Usage of Python locals ()
the zen of Python
Installation of Python 3.3 rc1
python bit full search
# 4 [python] Basics of functions
Basic knowledge of Python
Sober trivia of python3
Summary of Python arguments
Basics of python: Output
Installation of matplotlib (Python 3.3.2)
Application of Python 3 vars
Various processing of Python
Play with the password mechanism of GitHub Webhook and Python