[PYTHON] How easy is it to synthesize a drug on the market?

Introduction

Therefore, a useful index called ** SA score ** that can be used with RDKit [^ 1].

From the result

--Average: 3.5 --Median: 3.1

sascore_hist.png It would be helpful when narrowing down candidate compounds using the SA score as a guide.

Preprocessing

--2020-04-07 Medical drugs currently manufactured and sold in Japan -KEGG DRUG with D number assigned drugs (2826 types, duplicates) --Desalting because I want to see the ease of synthesis of the main fragments --I borrowed @ yamasakih's desalt.py [^ 2] [^ 2]: Commentary; The story of creating a compound database that can be accessed from Jupyter Notebook for drug discovery raid battle 2018 with Docker-compose of razi --Qiita --Calculate SA score

Calculation

This time, we will focus on compounds with 2 or more carbon atoms. In addition, mixed drugs, non-medicinal drugs, blood products, antibody drugs, crude drugs, etc. are excluded because they do not meet the purpose. The narrowing down at this point is 1641 compounds. Since the ones classified into multiple medicinal effects are duplicated, they are deleted to make 1436 compounds. The histogram will be the one described above.

mean std min 25% 50% 75% max
3.472627 1.262086 1.054917 2.547814 3.14387 4.190424 9.129873

Try to divide by medicinal effect classification

The overall distribution is found above. There is also information on drug efficacy classification, so let's see if there are any differences. Here, the calculation is based on the 1641 compound with no duplicates removed.

Drugs for the nervous system and sensory organs

Furthermore, it is classified into central nervous system drugs, peripheral nervous system drugs, sensory organ drugs, and others. Memantine and baclofen belong to the drug.

count mean std min 25% 50% 75% max
401 3.074675 1.066451 1.407299 2.368182 2.79175 3.496406 8.224301

1_神経系及び感覚器官用医薬品.png

Individual organ system medicines

Furthermore, it is classified into circulatory organ medicine, respiratory organ medicine, digestive organ medicine, hormonal medicine, urinary and reproductive organ and anal medicine, dermal medicine, dental and oral medicine, and others. Examples of pharmaceuticals include olmesartan and esomeprazole.

count mean std min 25% 50% 75% max
567 3.436648 1.211883 1.176561 2.556412 3.073396 4.338626 9.129873

2_個々の器官系用医薬品.png

Metabolic drugs

Furthermore, it is classified into vitamins, nourishing tonics, blood fluids, dialysis drugs, and others. Pharmaceuticals include prasugrel and canagliflozin.

count mean std min 25% 50% 75% max
196 3.562633 1.263028 1.58004 2.774178 3.307104 4.199926 9.121023

3_代謝性医薬品.png

Drugs for tissue cell function

Furthermore, it is classified into cell-utilizing drugs, tumor drugs, radiopharmaceuticals, allergy drugs, and others. Drugs include irinotecan and cetirizine.

count mean std min 25% 50% 75% max
191 3.608295 1.438333 1.694618 2.641493 3.066941 4.15542 7.705978

4_組織細胞機能用医薬品.png

Crude drugs and medicines based on Chinese prescription

Not applicable.

Drugs for pathogenic organisms

Furthermore, it is classified into antibiotic preparations, chemotherapeutic agents, biologics, parasite drugs, and others. Examples of pharmaceuticals include laninamivir and rifampicin.

count mean std min 25% 50% 75% max
201 4.159825 1.384663 1.762741 3.202318 3.992249 4.690629 8.214511

6_病原生物に対する医薬品.png

Drugs whose main purpose is not treatment

Furthermore, it is classified into dispensing drugs, diagnostic drugs, public health drugs, in-vitro diagnostic drugs, and others. Drugs include adenosine and edrophonium.

count mean std min 25% 50% 75% max
70 3.336593 1.027391 1.054917 2.544872 3.402247 3.976367 5.783386

7_治療を主目的としない医薬品.png

drug

It is further classified into alkaloid narcotics, non-alkaloid narcotics, and others.

count mean std min 25% 50% 75% max
15 3.747792 1.351973 1.977279 2.541722 3.994829 5.00452 5.273602

8_麻薬.png

Is there a significant difference between the classifications?

Welch's t-test was used between each of the two groups.

p-value Nerve / sensory organs Each organ metabolism Tissue cells Pathogenic organisms Non-treatment drug
Nerve / sensory organs - 0.000 0.000 0.000 0.000 0.053 0.076
Each organ 0.000 - 0.225 0.140 0.000 0.453 0.392
metabolism 0.000 0.225 - 0.740 0.000 0.140 0.615
Tissue cells 0.000 0.140 0.740 - 0.000 0.093 0.707
Pathogenic organisms 0.000 0.000 0.000 0.000 - 0.000 0.272
Non-treatment 0.053 0.453 0.140 0.093 0.000 - 0.281
drug 0.076 0.392 0.615 0.707 0.272 0.281 -

It is considered that the average value of neurological and sensory drugs is easier to synthesize than others, and antibiotics, chemotherapeutic drugs, and antiallergic drugs are difficult to synthesize.

Impressions

――It was a good practice for Pandas ――I would like to compare groups even with the target type [^ 3] --Please comment if you point out any mistakes

[^ 3]: KEGG BRITE: Target-based drug classification

Recommended Posts

How easy is it to synthesize a drug on the market?
How to test on a Django-authenticated page
[Python] What is a formal argument? How to set the initial value
It is convenient to use Layers when putting a library on Lambda
How to input a character string in Python and output it as it is or in the opposite direction.
How to calculate the volatility of a brand
How to live a decent life on 2017 Windows
How to publish a blog on Amazon S3 with the static Blog engine'Pelican'for Pythonista
How to access the contents of a Linux disk on a Mac (but read-only)
In Python, change the behavior of the method depending on how it is called
A record of the time it took to deploy mysql on Cloud9 + Rails
[AWS] Wordpress How to deal with "The response is not a correct JSON response"
Read the Python-Markdown source: How to create a parser
How to set a shared folder with the host OS in CentOS7 on VirtualBOX
Think about how to program Python on the iPad
How to write a GUI using the maya command
How to put Takoyaki Oishikunaru on the segment tree
How to create a submenu with the [Blender] plugin
How to deploy a Django application on Alibaba Cloud
How to install Linux on a 32bit UEFI PC
A memorandum on how to use keras.preprocessing.image in Keras
How to use any or all to check if it is in a dictionary (Hash)
How to build a Django (python) environment on docker
It was a life I wanted to OCR on AWS Lambda to locate the characters.
Make it easy to install the ROS2 development environment with pip install on Python venv
How to run Self bot on Discord.py [Easy vandalism! ]
How to check in Python if one of the elements of a list is in another list
How to find out which process is using the localhost port and stop it
A memo on how to overcome the difficult problem of capturing FX with AI
How to post a ticket from the Shogun API
How to enjoy Python on Android !! Programming on the go !!
How to run Django on IIS on a Windows server
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
How to build a Python environment on amazon linux 2
It is easy to execute SQL with Python and output the result in Excel
How to solve the problem that the login screen is not displayed forever on Ubuntu 19.04 because it stops at the logo at startup
It is difficult to install a green screen, so I cut out only the face and superimposed it on the background image
Even beginners can do it! An easy way to write a Sankey Diagram on Plotly
Is it a problem to eliminate the need for analog human resources in the AI era?
Check "[Windows] How to tell if the exe is x64 or x86 Part2" on MacOS Go
A story that makes it easy to estimate the living area using Elasticsearch and Python
A script that makes it easy to create rich menus with the LINE Messaging API
How to get the current weather data and display it on the GUI while updating it automatically
How to easily draw the structure of a neural network on Google Colaboratory using "convnet-drawer"
How to delete "(base)" that appears in the terminal when Anaconda is installed on Mac
[Python] What is a tuple? Explains how to use without tuples and how to use it with examples.
An easy way to view the time taken in Python and a smarter way to improve it
[VLC] How to deal with the problem that it is not in the foreground during playback
How to use GitHub on a multi-person server without a password
How to use Fujifilm X-T3 as a webcam on Ubuntu 20.04
[Ubuntu] How to delete the entire contents of a directory
How to run a trained transformer model locally on CloudTPU
How to build a new python virtual environment on Ubuntu
How to use the __call__ method in a Python class
[Hyperledger Iroha] Notes on how to use the Python SDK
How to make a multiplayer online action game on Slack
Generate a password that is easy to remember with apg
How to deploy the easiest python textbook pybot on Heroku
How to mount a Windows 10 directory on Ubuntu-Server 20.04 on VMware Workstation 15
A note on how to load a virtual environment in PyCharm
If it is not easy to understand, it cannot be improved.