Since the first step of the concept design, "process the raw data by" integrating the work time if the values other than the employee name and the work time are the same "" has been implemented, we will summarize it. https://qiita.com/wellwell3176/items/7dba981b479c5933bf5f Figure: Excerpt of raw data Since the employee name is unnecessary as data, I want to delete the employee name and add up the work time if all the information other than the employee name is the same.
program
import pandas as pd
df=pd.read_excel('/content/drive/My Drive/Colab Notebooks/data2.xlsx')
df["Classification"]=df["Classification"]+df["business"] #後工程で必要なのでClassificationとbusinessは一括表示にする
df=df.rename(columns = {'Classification':'業務Classification'} #Also change the line headings
df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S")
#Since the work time was entered as raw data in Excel user definition, convert from character string to datetime by specifying the format
df["Working hours"] = df["Working hours"].dt.minute
#The numbers are[~~ minutes][~~ Time]Since there is a request to display with, set the datetime[Minutes]Convert to int type
df2=df.groupby(["theme","time","Country name","Business division"],as_index=False).sum()
#Aggregate by other than employee name. df.Note that the result of groupby will not be saved unless it is assigned.
df2.to_excel('/content/drive/My Drive/Colab Notebooks/data5.xlsx')
Figure: Program output result It is OK because the character string concatenation of the business division, the deletion of the employee name, and the aggregation of time are realized.
** 1. Cannot convert string to time with to_datetime **
python
df["Working hours"] = pd.to_datetime(df["Working hours"])
-->TypeError: <class 'datetime.time> is not convertible to datetime
#There was a problem with the "display format" of Excel. User-defined "hh":mm ”is used and the default to_I couldn't read this on datetime
#Resolved by using the format option and specifying the description method of the corresponding column
df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S") #This is OK
** 2. Working hours are not aggregated by group by **
Failed version
df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S")
df.groupby("Country name").mean()
ValueError:No axis named theme for object type<class 'pandas.core.frame.DataFrame'>
It was output as shown in the above figure, and the aggregation by "working time" did not occur. The cause seems to be that the datetime format was not recognized as the aggregated value of the groupby function. Solved by changing datetime to int type like the finished product.
Successful version
df["Working hours"] = pd.to_datetime(df["Working hours"],format="%H:%M:%S")
df["Working hours"] = df["Working hours"].dt.minute
df.groupby("Country name").mean()
With this, the work time can be captured in the form of ~~ [minutes] on the tabulation side.
3. AttributeError: 'Series' object has no attribute 'minute'
python
df["Working hours"] = df["Working hours"].minute #I get an error using this
--> AttributeError: 'Series' object has no attribute 'minute'
#.Since dt was missing, df["Working hours"]Could not process for a series of arrays
df["Working hours"] = df["Working hours"].dt.minute #This is the correct answer
4. ValueError: No axis named HOGE for object type class'pandas.core.frame.DataFrame'
python
df.groupby("Country name","theme").sum() #I get an error using this
-->ValueError:No axis named theme for object type<class 'pandas.core.frame.DataFrame>
#A typo when using multiple indexes in the groupby function.[]Was not enough.
df.groupby(["theme","Country name"]).sum() #This is OK
The column name cannot be changed by rename () https://teratail.com/questions/291634 How to use Pandas groupby https://qiita.com/propella/items/a9a32b878c77222630ae How to convert dates using Series.dt () in Pandas https://qiita.com/Takemura-T/items/79b16313e45576bb6492