[PYTHON] Create a command to search for similar compounds from the target database with RDKit and check the processing time

Introduction

I was wondering how long it would take for a query compound to search for similar compounds in the target database (just SDF) with RDKit, so I wrote a command.

Source

When calculating similarity, it is common to generate a fingerprint and calculate the similarity score using the Tanimoto coefficient. Fingerprints are bits of chemical structure and there are various methods. Here, I tried using major MACCS Keys with a small number of bits.

import argparse
from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem
from rdkit import rdBase, Chem, DataStructs


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-query", type=str, required=True)
    parser.add_argument("-target_db", type=str, required=True)
    args = parser.parse_args()

    #Read query
    mol_block = ""
    with open(args.query) as f:
        for line in f:
            mol_block += line

    query_mol = Chem.MolFromMolBlock(mol_block)

    #Loading SDF
    target_sdf_sup = Chem.SDMolSupplier(args.target_db)

    #FingerPrint calculation(query)
    query_fp = AllChem.GetMACCSKeysFingerprint(query_mol)

    #FingerPrint calculation(target)
    target_fps = [AllChem.GetMACCSKeysFingerprint(mol) for mol in target_sdf_sup]

    for i, target_fp in enumerate(target_fps):
        result = DataStructs.TanimotoSimilarity(query_fp, target_fp)
        print(i, result)


if __name__ == "__main__":
    main()

How to use

Like this. Thank you argparse.

usage: StructureSimilaritySearch.py [-h] -query QUERY -target_db TARGET_DB

optional arguments:
  -h, --help            show this help message and exit
  -query QUERY(mol)
  -target_db TARGET_DB(sdf)

processing time

As usual, search by targeting 1024 train data of Solubility of RDkit. query is appropriate. Then, it will be returned in about 1 second. If it is 10,000 units, it seems that it will be reasonable as it is.

reference

https://future-chem.com/rdkit-fingerprint/

Recommended Posts

Create a command to search for similar compounds from the target database with RDKit and check the processing time

I tried to create serverless batch processing for the first time with DynamoDB and Step Functions

Zip-compress any file with the [shell] command to create a file and delete the original file.

I want to create a lunch database [EP1] Django study for the first time

I want to create a lunch database [EP1-4] Django study for the first time

I want to create a Dockerfile for the time being.

Create an audio file with the text-to-speech function with Google Text To Speak and check the text as a guide for the speech for 3 minutes.

Create a model to store information from the Google Books API for intuitive handling and testing

Create a summary table by product and time by processing the data extracted from a certain POS system

Search for Twitter keywords with tweepy and write the results to Excel

SSH login to the target server from Windows with a click of a shortcut

Create a clean DB for testing with FastAPI and unittest the API with pytest

How to create a shortcut command for LINUX

Create a command to get the work log

How to use the grep command to recursively search directories and files to a specified depth

Try to create a battle record table with matplotlib from the data of "Schedule-kun"

Perform a Twitter search from Python and try to generate sentences with Markov chains.

Check the processing time and the number of calls for each process in python (cProfile)

[Python] Measures and displays the time required for processing

"Stop committing Japanese files to git on Mac> <" For the time being, I wrote a script to search for incompatible Japanese files on Mac and Linux.

[Golang] Command to check the supported GOOS and GOARCH in a list (Check the supported platforms of the build)

Try to generate a cyclic peptide from an amino acid sequence with Python and RDKit

Create a script for your Pepper skill in a spreadsheet and load SayText directly from the script

Give the history command a date and time and collect the history files of all users with a script

Tips for Python beginners to use the Scikit-image example for themselves 8 Processing time measurement and profiler

Search for large files on Linux from the command line

Probably the easiest way to create a pdf with Python3

[Python] Create a date and time list for a specified period

Try a similar search for Image Search using the Python SDK [Search]

Create a Twitter BOT with the GoogleAppEngine SDK for Python

I want to get information from fstab at the ssh connection destination and execute a command

Try to generate a death metal jacket image with DCGAN + scrape the metal database site for that

I made a function to check if the webhook is received in Lambda for the time being

I want to record the execution time and keep a log.

Define the reaction pattern with SMARTS with RDKit and generate a reactant

Create an alias for Route53 to CloudFront with the AWS API

Create a striped illusion with gamma correction for Python3 and openCV3

Create a color picker for the color wheel with Python + Qt (PySide)

How to make a command to read the configuration file with pyramid

[Go] Create a CLI command to change the extension of the image

How to create a label (mask) for segmentation with labelme (semantic segmentation mask)

[EC2] How to install and download chromedriver from the command line

I tried to create Bulls and Cows with a shell program

A command to easily check the speed of the network on the console

Create custom Django commands and run them from the command line

For the time being, I want to convert files with ffmpeg !!

Create a function to get the contents of the database in Go

Create a REST API to operate dynamodb with the Django REST Framework

Create and return a CP932 CSV file for Excel with Chalice

[Python] How to create a dictionary type list, add / change / delete elements, and extract with a for statement

I want to use only the SMTP MAIL FROM command and RCPT TO command without sending mail with Python's smtplib

Upload data to s3 of aws with a command and update it, and delete the used data (on the way)