This is a personal memo of the part where I stumbled while writing the code. Besides simply reading and writing out data frames This is an example of processing when you want to create a new column B that contains 1 if there is a circle in column A and 2 if it does not.
In this code, "dfCsv" is generally written as "df".
dfex.py
import csv
import codecs
import os, os.path
import datetime
import pandas as pd
import warnings
CSVFILE="Nanna.csv"
def main():
print(str(datetime.datetime.now())+"\t"+"Start reading the target data.")
#Convert from CSV file to data frame dfCsv.
dfCsv= pd.read_csv(CSVFILE,encoding='cp932', header=0)
print(str(datetime.datetime.now())+"\t"+CSVFILE+":Loading is complete.")
#When you add a new column, you can do it like this.
dfCsv=textSearch(dfCsv)
#Result the execution result.Export to csv
with open("result.csv",mode='w') as f:
s = ""
f.write(s)
dfCsv.to_csv("result.csv",mode="a")
#Added to existing data frame.
def textSearch(dfTmp):
#Declare an empty list
#If you append while reading one line from the data frame, the list will have the same number of lines as the data frame.
profList=[]
for profTxt in dfTmp['profile']:
profList.append(profTxt)
retList=[]
for prof in profList:
if ("Japan" in str(prof)) :
ret="Japanese"
else:
ret="not Japanese"
retList.append(ret)
#Join the list created by this subroutine to the passed data frame.
dfTmp['Japanese?'] = retList
return(dfTmp)
if __name__ == "__main__":
main()
This is the heart of this time.
#When you add a new column, you can do it like this.
dfCsv=textSearch(dfCsv)
It doesn't mean "just call a function called textSearch!". The textSearch itself is defined in this program code. If you pass the data frame to the subroutine and process it in this way You can add a new column to the data frame that stores the processing results.
Recommended Posts