[Python] 100 knocks on data science (structured data processing) 030 Explanation

Youtube Video commentary is also available.

problem

P-030: For the receipt detail data frame (df_receipt), calculate the sample variance of the sales amount (amount) for each store code (store_cd), and display the TOP5 in descending order.

answer

code


df_receipt.groupby('store_cd').amount.var(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)

output

store_cd amount
28 S13052 440088.7013
31 S14011 306314.5582
42 S14034 296920.081
5 S13001 295431.9933
12 S13015 295294.3611

Commentary

-Pandas DataFrame / Series. -Use this when you want to process data with the same value together and check the total or average of the data with the same value. -'Groupby' is used when you want to collect data with the same value or character string and perform common operations (total, average, etc.) on each same value or character string. -**'.Var' is the code to calculate the sample variance **. · ' .var (ddof = 0)' means dividing by'N-ddof (= 0)' when calculating the sample variance . -'Ddof' is an abbreviation for'delta degrees of freedom'. See Wikipedia for more information. -'.Reset_index ()' is used when you want to perform an operation to reassign the index numbers separated by'groupby' to serial numbers starting from 0. -'Amount' is displayed in descending order with'.sort_values ('amount', ascending = False)'.

code


df_receipt.groupby('store_cd').agg({'amount':'var'}).reset_index().sort_values('amount', ascending=False).head(5)

Recommended Posts

[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 001-010 Impressions + Explanation Link Summary
Try "100 knocks on data science" ①
Getting started with Python with 100 knocks on language processing
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Challenge 100 data science knocks
Data science 100 knock (structured data processing) environment construction (Windows10)
Data Science Cheat Sheet (Python)
[Python] Notes on data analysis
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 3]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]
Image processing with Python 100 knocks # 3 Binarization
Image processing with Python 100 knocks # 2 Grayscale
Image processing with Python 100 knocks # 8 Max pooling
I took Udemy's "Practical Python Data Science"
[Python] Various data processing using Numpy arrays
Image processing with Python 100 knocks # 7 Average pooling
Video processing using Python + OpenCV on Mac
Image processing with Python 100 knocks # 9 Gaussian filter
Books on data science to read in 2020
Periodically execute Python Script on AWS Data Pipeline
Folium: Visualize data on a map with Python
[Data science memorandum] Handling of missing values ​​[python]
Try importing MLB data on Mac and Python
TensorFlow: Run data learned in Python on Android
Run Python on Apache to view InfluxDB data
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 40
100 language processing knocks (2020): 32
[Python] Challenge 100 knocks! (015 ~ 019)
100 language processing knocks (2020): 35
python image processing
100 language processing knocks (2020): 39
Python on Windows
twitter on python3