[PYTHON] Memorandum (acquisition / conversion of "user-defined" time, cross tabulation)

Introduction

Since the first step of the concept design, "process the raw data by" integrating the work time if the values other than the employee name and the work time are the same "" has been implemented, we will summarize it. https://qiita.com/wellwell3176/items/7dba981b479c5933bf5f image.png Figure: Excerpt of raw data Since the employee name is unnecessary as data, I want to delete the employee name and add up the work time if all the information other than the employee name is the same.

Completed program

program


import pandas as pd
df=pd.read_excel('/content/drive/My Drive/Colab Notebooks/data2.xlsx')

df["Classification"]=df["Classification"]+df["business"] #後工程で必要なのでClassificationとbusinessは一括表示にする
df=df.rename(columns = {'Classification':'業務Classification'} #Also change the line headings

df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S") 
 #Since the work time was entered as raw data in Excel user definition, convert from character string to datetime by specifying the format
df["Working hours"] = df["Working hours"].dt.minute
 #The numbers are[~~ minutes][~~ Time]Since there is a request to display with, set the datetime[Minutes]Convert to int type

df2=df.groupby(["theme","time","Country name","Business division"],as_index=False).sum()
 #Aggregate by other than employee name. df.Note that the result of groupby will not be saved unless it is assigned.
df2.to_excel('/content/drive/My Drive/Colab Notebooks/data5.xlsx')

image.png Figure: Program output result It is OK because the character string concatenation of the business division, the deletion of the employee name, and the aggregation of time are realized.

Errors and failures that occurred during the program "and their overview

** 1. Cannot convert string to time with to_datetime **

python


df["Working hours"] = pd.to_datetime(df["Working hours"]) 

-->TypeError: <class 'datetime.time> is not convertible to datetime
#There was a problem with the "display format" of Excel. User-defined "hh":mm ”is used and the default to_I couldn't read this on datetime
#Resolved by using the format option and specifying the description method of the corresponding column
df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S") #This is OK

** 2. Working hours are not aggregated by group by **

Failed version


df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S") 
 
df.groupby("Country name").mean()
ValueError:No axis named theme for object type<class 'pandas.core.frame.DataFrame'>

image.png It was output as shown in the above figure, and the aggregation by "working time" did not occur. The cause seems to be that the datetime format was not recognized as the aggregated value of the groupby function. Solved by changing datetime to int type like the finished product.

Successful version


df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S") 
df["Working hours"] = df["Working hours"].dt.minute 

df.groupby("Country name").mean()

image.png With this, the work time can be captured in the form of ~~ [minutes] on the tabulation side.

3. AttributeError: 'Series' object has no attribute 'minute'

python


df["Working hours"] = df["Working hours"].minute #I get an error using this

--> AttributeError: 'Series' object has no attribute 'minute'
#.Since dt was missing, df["Working hours"]Could not process for a series of arrays

df["Working hours"] = df["Working hours"].dt.minute #This is the correct answer

4. ValueError: No axis named HOGE for object type class'pandas.core.frame.DataFrame'

python


df.groupby("Country name","theme").sum() #I get an error using this

-->ValueError:No axis named theme for object type<class 'pandas.core.frame.DataFrame>
#A typo when using multiple indexes in the groupby function.[]Was not enough.

df.groupby(["theme","Country name"]).sum() #This is OK

Referenced pages / sites

The column name cannot be changed by rename () https://teratail.com/questions/291634 How to use Pandas groupby https://qiita.com/propella/items/a9a32b878c77222630ae How to convert dates using Series.dt () in Pandas https://qiita.com/Takemura-T/items/79b16313e45576bb6492

Recommended Posts

Memorandum (acquisition / conversion of "user-defined" time, cross tabulation)
Conversion of time data in 25 o'clock notation
Acquisition of time series data (daily) of stock prices
measurement of time