Using SymPy, a computer algebra library, I performed an operation on a column of a Pandas data frame using a function defined in an external file, and output the result.
Like this. train is the input file, output is the output file, and function is the function definition file.
$ python command/calculate_function.py -h
usage: calculate_function.py [-h] -train TRAIN -function FUNCTION -output
OUTPUT
optional arguments:
-h, --help show this help message and exit
-train TRAIN input function file.
-function FUNCTION input function file.
-output OUTPUT output csv file.
The function definition file looks like this.
exp,cos(x),NewExp
exp,exp(x),ExpExp
exp,sin(x),SinExp
The first column is the column name to be calculated, and the second column name is the function expression. By the way, x means the value of the column to be calculated. The third column is the column name that stores the calculation result.
The source looks like this. The exec command is used to recognize the function expression as a python source.
calculate_function.py
import argparse
import csv
import pandas as pd
import numpy as np
from sympy import *
import csv
def main():
parser = argparse.ArgumentParser()
parser.add_argument("-train", type=str, required=True, help="input function file.")
parser.add_argument("-function", type=str, required=True, help="input function file.")
parser.add_argument("-output", type=str, required=True, help="output csv file.")
args = parser.parse_args()
df = pd.read_csv(args.train, index_col=0)
#Data reading
file = open(args.function, 'r')
data = csv.reader(file)
for row in data:
exec('x=Symbol("x")')
exec('f='+str(row[1]))
exec('func = lambdify((x), f, "numpy")')
exec('df["{0}"] = func(df["{1}"])'.format(row[2], row[0]))
file.close()
df.to_csv(args.output)
if __name__ == "__main__":
main()
Input file
CMPD_CHEMBLID,exp,smiles
CHEMBL596271,3.54,Cn1c(CN2CCN(CC2)c3ccc(Cl)cc3)nc4ccccc14
CHEMBL1951080,-1.18,COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23
CHEMBL1771,3.69,COC(=O)[C@@H](N1CCc2sccc2C1)c3ccccc3Cl
CHEMBL234951,3.37,OC[C@H](O)CN1C(=O)C(Cc2ccccc12)NC(=O)c3cc4cc(Cl)sc4[nH]3
CHEMBL565079,3.1,Cc1cccc(C[C@H](NC(=O)c2cc(nn2C)C(C)(C)C)C(=O)NCC#N)c1
CHEMBL317462,3.14,OC1(CN2CCC1CC2)C#Cc3ccc(cc3)c4ccccc4
The function file is the file shown in the example of the function specification file.
Output result
CMPD_CHEMBLID,exp,smiles,NewExp,ExpExp,SinExp
CHEMBL596271,3.54,Cn1c(CN2CCN(CC2)c3ccc(Cl)cc3)nc4ccccc14,-0.9216800341052034,34.46691919085739,-0.3879509179417303
CHEMBL1951080,-1.18,COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23,0.38092482436688185,0.30727873860113125,-0.9246060124080203
CHEMBL1771,3.69,COC(=O)[C@@H](N1CCc2sccc2C1)c3ccccc3Cl,-0.8533559001656995,40.044846957286715,-0.5213287903544065
CHEMBL234951,3.37,OC[C@H](O)CN1C(=O)C(Cc2ccccc12)NC(=O)c3cc4cc(Cl)sc4[nH]3,-0.9740282491988521,29.07852705779708,-0.22642652177388314
CHEMBL565079,3.1,Cc1cccc(C[C@H](NC(=O)c2cc(nn2C)C(C)(C)C)C(=O)NCC#N)c1,-0.9991351502732795,22.197951281441636,0.04158066243329049
CHEMBL317462,3.14,OC1(CN2CCC1CC2)C#Cc3ccc(cc3)c4ccccc4,-0.9999987317275395,23.103866858722185,0.0015926529164868282
It seems that some results are coming out.
Sympy seems to be able to give more complicated formulas such as conditional branching, so I would like to write about it again.
Recommended Posts