This is a personal memo of the part where I stumbled while writing the code. Besides simply reading and writing out data frames This is an example of processing when you want to create a new column B that contains 1 if there is a circle in column A and 2 if it does not.
In this code, "dfCsv" is generally written as "df".
dfex.py
import csv
import codecs
import os, os.path
import datetime
import pandas as pd
import warnings
CSVFILE="Nanna.csv"
def main():
    print(str(datetime.datetime.now())+"\t"+"Start reading the target data.")
    #Convert from CSV file to data frame dfCsv.
    dfCsv= pd.read_csv(CSVFILE,encoding='cp932', header=0)
    print(str(datetime.datetime.now())+"\t"+CSVFILE+":Loading is complete.")
    
    
    #When you add a new column, you can do it like this.
    dfCsv=textSearch(dfCsv)  
    
    #Result the execution result.Export to csv
    with open("result.csv",mode='w') as f:
        s = ""
        f.write(s)
    dfCsv.to_csv("result.csv",mode="a")
#Added to existing data frame.
def textSearch(dfTmp):
    #Declare an empty list
    #If you append while reading one line from the data frame, the list will have the same number of lines as the data frame.
    profList=[]
    for profTxt in dfTmp['profile']:
        profList.append(profTxt)
    retList=[]
    for prof in profList:
        if ("Japan" in str(prof))  : 
            ret="Japanese"
        else:
            ret="not Japanese"
        retList.append(ret)
    #Join the list created by this subroutine to the passed data frame.
    dfTmp['Japanese?'] = retList 
    return(dfTmp)
if __name__ == "__main__":
    main()
This is the heart of this time.
    #When you add a new column, you can do it like this.
    dfCsv=textSearch(dfCsv)  
It doesn't mean "just call a function called textSearch!". The textSearch itself is defined in this program code. If you pass the data frame to the subroutine and process it in this way You can add a new column to the data frame that stores the processing results.
Recommended Posts