Basic summary of data manipulation in Python Pandas-Second half: Data aggregation

Introduction

How to manipulate data in Pandas, which is essential for handling data analysis in Python I summarized the basics.

From important grammar that you forget about, we have included some tips.

Recommended for people like this → I want to touch Pandas for the first time! → Try to use R in Python. → I can't remember the grammar of Pandas-it would be convenient if there was a list somewhere ... → How much data handling can be done with Python in the first place?

If you want to know about data manipulation, please start from the first half.

◆ Basic summary of data manipulation with Python Pandas-First half: Data creation & operation http://qiita.com/hik0107/items/d991cc44c2d1778bb82e

Let's do the calculation

◆ Statistic calculation

Find statistics for each row or column of a data frame

math.py


 
#Column direction total
df_sample["score1"].sum(axis=0) #Calculate the sum of Score1 values
        #axis=0 means to sum in the vertical direction. Since it is 0 by default, it can be omitted.
 
df_sample[["score1","score2"]].sum(axis=0)  #score1,Sum each score2. Two results are output
 
 
#Row direction total
df_sample[["score1","score2"]].sum(axis=1)  
        #Sum the score1 and score2 values in each row. The result is output for each number of columns
        #axis=1 means to sum in the horizontal direction. In Pandas, Axis is the Row direction. "
Remember that you often distinguish between Column directions.

◆Pivoting Pivot table-like crosstab and data structure conversion

pivot.py


 
df_sample.pivot_table("score1",     #Specifying variables to aggregate
                       aggfunc="sum",  #Specifying how to aggregate
                       fill_value=0,   #Specifying the padding value when there is no corresponding value
                       rows="class",     #Specifying variables to leave in the row direction
                       columns="day_no")   #Specify variables to expand in the column direction

◆ Group_by operation

groupby.py


#In Pandas, the operation of Groupby and the accompanying Aggregation are performed separately.
#If you use the groupby method, it looks like a normal dataframe, but Group_An object with the Key information of By is generated.
#This also applies to R. Group by in Dplyr()A key is set by, and Summarise aggregates according to the key.
 
df_sample_grouped = df_sample.groupby("day_no")  # day_Group with no_Do by.
df_sample_grouped[["score1","score2"]].sum()          
  #Sum for grouped objects.
  #If desired, you can specify a variable to sum.
 
# Group_By Key is forcibly treated as Index
#Therefore, Group_Cannot be treated as a column variable like before by

df_sample_grouped = df_sample.groupby("day_no", as_index=false)
   #   as_index=If false is specified, it will stop being treated as an index.

Let's read and write data

◆ Data import and export

Create DF from csv file or export DF to csv

file.py


 
#Import csv data
pd.read_csv("path_of_data")
 
#Export csv data
 
pd.to_csv("path_of_exported_file")

Recommended Posts

Basic summary of data manipulation in Python Pandas-Second half: Data aggregation
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Summary of tools needed to analyze data in Python
Pixel manipulation of images in Python
Summary of various for statements in Python
Summary of built-in methods in Python list
Summary of how to import files in Python 3
Real-time visualization of thermography AMG8833 data in Python
Summary of how to use MNIST in Python
The story of reading HSPICE data in Python
A well-prepared record of data analysis in Python
Basic story of inheritance in Python (for beginners)
Summary of Excel operations using OpenPyXL in Python
Data analysis in Python Summary of sources to look at first for beginners
Basic data frame operations written by beginners in a week of learning Python
Numerical summary of data
Basic sorting in Python
String manipulation in python
Date manipulation in Python
Basic knowledge of Python
Summary of Python arguments
Full-width and half-width processing of CSV data in Python
Power BI visualization of Salesforce data entirely in Python
[Python] Manipulation of elements in list (array) [Add / Delete]
Summary of Pandas methods used when extracting data [Python]
Not being aware of the contents of the data in python
List of Python code used in big data analysis
Let's use the open data of "Mamebus" in Python
Summary of the basic flow of machine learning with Python
Summary of date processing in Python (datetime and dateutil)
Summary of statistical data analysis methods using Python that can be used in business
Try scraping the data of COVID-19 in Tokyo with Python
Handle Ambient data in Python
Summary of python file operations
Summary of Python3 list operations
Refactoring Learned in Python (Basic)
What's new in Python 3.10 (Summary)
Data Manipulation in Python-Try Pandas_plyr
Display UTM-30LX data in Python
[For beginners] Summary of standard input in Python (with explanation)
Equivalence of objects in Python
[Homology] Count the number of holes in data with Python
Python data type summary memo
A memo of writing a basic function in Python using recursion
Face detection summary in Python
Comparison of data frame handling in Python (pandas), R, Pig
Basic usage of Pandas Summary
String date manipulation in Python
Basic usage of Python f-string
Implementation of quicksort in Python
What's new in Python 3.9 (Summary)
Summary of basic drawing method of the de facto standard Plotly Express of Python drawing library in the Reiwa era
A simple data analysis of Bitcoin provided by CoinMetrics in Python
A brief summary of Graphviz in python (explained only for mac)
Basic map information using Python Geotiff conversion of numerical elevation data
[Blender Python] Arrange custom property data in template_list () of UI layout
Get Leap Motion data in Python.
File / folder path manipulation in Python
Read Protocol Buffers data in Python3
Summary of basic knowledge of PyPy Part 1
Get data from Quandl in Python