[Python] 100 knocks on data science (structured data processing) 026 Explanation

Youtube Video commentary is also available.

problem

P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.

answer

code


df_sales_ymd = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()

df_sales_ymd.columns = ['customer_id','sales_ymd_max','sales_ymd_min']

df_sales_ymd.query('sales_ymd_max != sales_ymd_min').head(10)

output

customer_id sales_ymd_max sales_ymd_min
1 CS001114000005 20190731 20180503
2 CS001115000010 20190405 20171228
3 CS001205000004 20190625 20170914
4 CS001205000006 20190224 20180207
13 CS001214000009 20190902 20170306
14 CS001214000017 20191006 20180828
16 CS001214000048 20190929 20171109
17 CS001214000052 20190617 20180208
20 CS001215000005 20181021 20170206
21 CS001215000040 20171022 20170214

Commentary

-Use this when you want to process data with the same value collectively in Pandas DataFrame / Series and check the total or average of the data with the same value. -**'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. ** ** ・ '.Agg ({'sales_ymd': ['max','min']})' is the maximum value (= newest sales date) and minimum value (= oldest sales date) of'.sales_ymd'. Is displayed. ** ** -'.Reset_index ()'is used when you want to reassign the index numbers that have been separated by'groupby' to serial numbers starting from 0. ** ** -The 2nd and 3rd lines are the code already mentioned, but the column name is specified by'.columns' and the condition is specified by'.query' **

code


df_sales_ymd.columns = ["_".join(pair) for pair in df_sales_ymd.columns]

Recommended Posts

[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 001-010 Impressions + Explanation Link Summary
Try "100 knocks on data science" ①
Getting started with Python with 100 knocks on language processing
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Challenge 100 data science knocks
Data science 100 knock (structured data processing) environment construction (Windows10)
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 3]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]
Start data science on the cloud
Image processing with Python 100 knocks # 2 Grayscale
Image processing with Python 100 knocks # 8 Max pooling
I took Udemy's "Practical Python Data Science"
Image processing with Python 100 knocks # 7 Average pooling
Video processing using Python + OpenCV on Mac
Image processing with Python 100 knocks # 9 Gaussian filter
Books on data science to read in 2020
Periodically execute Python Script on AWS Data Pipeline
Folium: Visualize data on a map with Python
[Data science memorandum] Handling of missing values ​​[python]
Try importing MLB data on Mac and Python
TensorFlow: Run data learned in Python on Android
Run Python on Apache to view InfluxDB data
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 40
100 language processing knocks (2020): 32
[Python] Challenge 100 knocks! (015 ~ 019)
100 language processing knocks (2020): 35
python image processing
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
Python on Windows
twitter on python3
100 language processing knocks (2020): 22
[Python] Challenge 100 knocks! (030-034)