[Python] 100 knocks on data science (structured data processing) 029 Explanation

Youtube Video commentary is also available.

problem

P-029: Find the mode of the product code (product_cd) for each store code (store_cd) for the receipt details data frame (df_receipt).

answer

code


df_receipt.groupby('store_cd').product_cd.apply(lambda x: x.mode()).reset_index() \
.set_index(['store_cd','level_1','product_cd'])

output

store_cd level_1 product_cd
S12007 0 P060303001
S12013 0 P060303001
S12014 0 P060303001
S12029 0 P060303001
S12030 0 P060303001
S13001 0 P060303001
S13002 0 P060303001
S13003 0 P071401001
S13004 0 P060303001
S13005 0 P040503001
S13008 0 P060303001
S13009 0 P060303001
S13015 0 P071401001
S13016 0 P071102001
S13017 0 P060101002
S13018 0 P071401001
S13019 0 P071401001
S13020 0 P071401001
S13031 0 P060303001
S13032 0 P060303001
S13035 0 P040503001
S13037 0 P060303001
S13038 0 P060303001
S13039 0 P071401001
S13041 0 P071401001
S13043 0 P060303001
S13044 0 P060303001
S13051 0 P050102001
1 P071003001
2 P080804001
S13052 0 P050101001
S14006 0 P060303001
S14010 0 P060303001
S14011 0 P060101001
S14012 0 P060303001
S14021 0 P060101001
S14022 0 P060303001
S14023 0 P071401001
S14024 0 P060303001
S14025 0 P060303001
S14026 0 P071401001
S14027 0 P060303001
S14028 0 P060303001
S14033 0 P071401001
S14034 0 P060303001
S14036 0 P040503001
1 P060101001
S14040 0 P060303001
S14042 0 P050101001
S14045 0 P060303001
S14046 0 P060303001
S14047 0 P060303001
S14048 0 P050101001
S14049 0 P060303001
S14050 0 P060303001

Commentary

-Pandas DataFrame / Series. -Use this when you want to process data with the same value together and check the total or average of the data with the same value. -'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. ** ** -'.Apply (lambda x: )'is a method ** that applies (= apply) the function to the column specified immediately before. Pass the argument x to the apply method by saying'lambda x: '. ** lambda is called an anonymous function and is used in a lambda expression that simply declares the function unnamed . (For more information on the apply method, click here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html), for more information on the mode method, click here (https:: //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mode.html)) -The automatically generated'level_1'column renumbers the index number from '0' when there are multiple modes of product_cd for each store_cd. -'.Reset_index ()'is used when you want to reassign the index numbers that have been separated by'groupby' to serial numbers starting from 0. ** ** -It doesn't matter if you don't have'.set_index ()', but you can use it to set'store_cd' to Multi-index (display as index over multiple lines).

code


df_receipt.groupby('store_cd').agg({'product_cd':'mode'}).reset_index()

Recommended Posts

[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
Try "100 knocks on data science" ①
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Challenge 100 data science knocks
Data science 100 knock (structured data processing) environment construction (Windows10)
Data Science Cheat Sheet (Python)
[Python] Notes on data analysis
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 3]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]
Start data science on the cloud
Image processing with Python 100 knocks # 3 Binarization
Image processing with Python 100 knocks # 2 Grayscale
Image processing with Python 100 knocks # 8 Max pooling
I took Udemy's "Practical Python Data Science"
[Python] Various data processing using Numpy arrays
Image processing with Python 100 knocks # 7 Average pooling
Image processing with Python 100 knocks # 9 Gaussian filter
Books on data science to read in 2020
Periodically execute Python Script on AWS Data Pipeline
Folium: Visualize data on a map with Python
[Data science memorandum] Handling of missing values ​​[python]
Try importing MLB data on Mac and Python
TensorFlow: Run data learned in Python on Android
Run Python on Apache to view InfluxDB data
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 32
100 language processing knocks (2020): 35
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
Python on Windows
twitter on python3
100 language processing knocks (2020): 22
100 language processing knocks (2020): 26
100 language processing knocks (2020): 34
[Python] Challenge 100 knocks! (030-034)
100 language processing knocks (2020): 42