[PYTHON] Data Science 100 Knock ~ Battle for less than beginners part4

This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

100 knock articles 100 Knock Guide

** Be careful if you are trying to do it as it includes spoilers **

The reason I'm writing here is because I earn about one page to prevent spoilers ()

I got tired of it on the way, so I dug up the contents of Docker.

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This time from 23 to 28. [Last time] 19-22 [First time with table of contents]

23rd

mine23.py


df=df_receipt
df=df.groupby('store_cd').agg({'amount':'sum','quantity':'sum'}).reset_index()
df.head(10)

Yes, suddenly a new way of writing came. Seeing the reference page, it seems that it can be used for data aggregation.

Even in Excel, the same sum, min, max can be understood, but the average of writing ave unintentionally …… I haven't used it since I was taken care of by std or an examinee, but I wonder if it will be taken care of from now on ...

min': minimum value 'max': maximum value 'mean': mean 'median': median 'std': standard deviation

Also, it's hard to understand at first glance

df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})

In the part of, there is a way to write the maximum and minimum of "B" for each "A". However, if you write this way, a hierarchy will be created. ~~ The reference book says it's convenient, but this is very annoying ~~

Digression

yodan.py


df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']})

df['sales_ymd']#'sales_ymd'Index disappears'max''min'Column is projected

df['sales_ymd'][['max']]#'max'Projection only in columns

It's very annoying to have to do this when you want to refer to the hierarchy I struggled with 23-27, so I will post it for the time being

24th and 25th

mine24.py


df=df_receipt
df.groupby('customer_id').agg({'sales_ymd':'max'}).reset_index().head(10)

It's the max version of 23 (or rather, the digression rewrote this)

mine25.py


'''Model answer'''
df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).head(10)

This is a model answer. The model answer was quiet because there were many simple things that I couldn't understand without cheating recently.

26th

P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.

mine26.py


df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df=df[df['sales_ymd']['max'] != df['sales_ymd']['min']]
df.head(10)

'''Model answer'''
df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df_tmp.columns = ["_".join(pair) for pair in df_tmp.columns]
df_tmp.query('sales_ymd_max != sales_ymd_min').head(10)

What is the second line of this model answer ... I understand that the hierarchy is erased.

The result of my output of doing this is

customer_id sales_ymd
max min

While it looks like this

customer_id_ sales_ymd_max sales_ymd_min

I understand that it looks beautiful like this ~~, but after all the hierarchy is an obstacle ~~

27th and 28th

mine27.py


df=df_receipt
df=df.groupby('store_cd').agg({'amount':['mean']}).reset_index()
df.columns=['store_id','amount_mean']
df=df.sort_values('amount_mean',ascending= False)
df.head(5)

'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)

mine28.py


df=df_receipt
df=df.groupby('store_cd').agg({'amount':['median']}).reset_index()
df.columns=['store_id','amount_median']
df=df.sort_values('amount_median',ascending= False)
df.head(5)

'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)

27 and 28 are sorted. It's a secret that I wrote ~~ ʻave` ~~ Up to this point, it was on the reference site, so it went smoothly to some extent. The problem is next time.

Up to here for this time

Next time, mathematical violence will attack the author who was mossed in Number IIB! ~~ Mock test? I earned points by programming! ~~

Recommended Posts

Data Science 100 Knock ~ Battle for less than beginners part3
Data Science 100 Knock ~ Battle for less than beginners part6
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7
Data Science 100 Knock ~ Battle for less than beginners part4
Data Science 100 Knock ~ Battle for less than beginners part11
Data science 100 knocks ~ Battle for less than beginners part5
Data science 100 knocks ~ Battle for less than beginners part10
Data science 100 knocks ~ Battle for less than beginners part8
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Time series data anomaly detection for beginners
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Linux command] less command option list [Must-see for beginners]
For new students (Recommended efforts for Python beginners Part 1)
How to use data analysis tools for beginners
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Data science 100 knock (structured data processing) environment construction (Windows10)
Basics of pandas for beginners ② Understanding data overview