Summary of Pandas methods used when extracting data [Python]

Introduction

When retrieving data in Python I make heavy use of Pandas libraries.

However, beginners in data analysis

It will be a situation like that.

In this article I tried to summarize the methods that frequently appear when extracting data.

environment

What is Pandas?

It is one of the Python libraries for efficient data analysis.

Implementation

Load necessary data

This time, we will use the "iris" dataset, which is available as standard in seaborn.

import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()

スクリーンショット 2020-05-06 11.44.08.png

Extract data by specifying a matrix

You can get the data freely by specifying the row number and column number.

Data extraction with line numbers

#3rd line data
iris.iloc[3]

image.png

#Data on lines 0 and 2
iris.iloc[:3]

image.png

#3rd row, 1st column value
iris.iloc[3, 0]

image.png

#Data in the 0th to 2nd rows and 2nd to 3rd columns
iris.iloc[:3, 2:4]

image.png

Extract data by specifying row number and column name

iris.loc[[2,4,6],['petal_length', 'petal_width']]

image.png

Extract data under specific conditions

The method for extracting data by specifying conditions is as follows.

Data extraction based on exact match conditions

For the items of species, get the number of cases whose contents match setosa.

len(iris[iris['species'] == 'setosa'])

Data extraction using multiple conditions

When extracting data, if you want to narrow down by multiple conditions, you can do it by adding conditions.

#and condition is()&()And the or condition is()|()
iris[(iris['species'] == 'setosa') & (iris['petal_width'] > 0.5)]

Data extraction by partial match condition

There are cases where you want to extract not only exact matches but also partially matched contents. In such cases, the following contents can be used.

#Partial match search(Extract only those that partially match se)
iris[iris.species.str.contains('se')]

Aggregate data

At the time of aggregation, it is processed after being converted to DataFrameGroupBy type.

iris_group = iris.groupby('species')
type(iris_group)

The output result is as follows.

pandas.core.groupby.generic.DataFrameGroupBy

Average value

iris_group.mean()

The output image is as follows.

スクリーンショット 2020-05-06 11.53.53.png

In addition, the minimum value, maximum value, standard deviation, etc. can be calculated.

Aggregation is also possible based on multiple conditions.

iris_group2 = iris.groupby(['species', 'petal_width'])
iris_group2.mean()

スクリーンショット 2020-05-06 11.55.44.png

Combine data

Combine data with the same column structure

To combine data that have the same column structure, use the append method or concat method.

This time, we will focus on the Panadas method, so we will combine it with the concat method.

import pandas as pd
iris_master = pd.DataFrame([['0', 'setosa'], ['1', 'versicolor'], ['2', 'virginica']], columns=['id', 'name'])
iris_master

image.png

add_iris = pd.DataFrame([['3', 'hoge']], columns=['id', 'name'])
add_iris

image.png

pd.concat([iris_master, add_iris])

image.png

Combine data with different column configurations

Use the merge method when merging multiple data with different data column configurations. (Although it is possible to join with the join method, it is necessary to index the column you want to use as a key, which is a little troublesome, so I think that there is no problem if the merge method can be used first.)

When joining, by specifying the key item for joining, Join rows with the same items.

pd.merge(iris_group2.mean(), iris_master, left_on='species', right_on='name')

スクリーンショット 2020-05-06 12.04.02.png

Finally

In the future, we plan to enhance the following contents.

Reference information

The above contents are summarized based on the following sites.

It is explained in more detail here, so if you have any questions, please refer to it.

Recommended Posts

Summary of Pandas methods used when extracting data [Python]
A collection of methods used when aggregating data with pandas
Summary of methods often used in pandas
Xpath summary when extracting data from websites with Python Scrapy
[Python] Summary of how to use pandas
Summary of built-in methods in Python list
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
Summary of frequently used Python arrays (for myself)
Selenium webdriver Summary of frequently used operation methods
Summary of error handling methods when installing TensorFlow (2)
Numerical summary of data
Summary of Python arguments
Summary of scikit-learn data sources that can be used when writing analysis articles
A memorandum of method often used when analyzing data with pandas (for beginners)
Summary of tools needed to analyze data in Python
Summary of pre-processing practices for Python beginners (Pandas dataframe)
List of Python code used in big data analysis
[Python] Summary of table creation method using DataFrame (pandas)
Summary of things that were convenient when using pandas
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Summary of python file operations
Summary of Python3 list operations
python pandas study recent summary
Python data type summary memo
Basic usage of Pandas Summary
Data analysis using python pandas
The Power of Pandas: Python
Basic summary of data manipulation in Python Pandas-Second half: Data aggregation
Notes on handling large amounts of data with python + pandas
Comparison of data frame handling in Python (pandas), R, Pig
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
The minimum methods to remember when aggregating data in Pandas
[Python] Operation memo of pandas DataFrame
Hit treasure data from Python Pandas
A brief summary of Python collections
How to avoid duplication of data when inputting from Python to SQLite.
Grammar summary often used in pandas
Python Pandas Data Preprocessing Personal Notes
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
[Anaconda3] Summary of frequently used commands
Python --Symbols used when performing operations
Summary of Python indexes and slices
[Python] Format when to_csv with pandas
[OpenCV; Python] Summary of findcontours function
Do not change the order of columns when concatenating pandas data frames.
A summary of Python e-books that are useful for free-to-read data analysis
[Summary of books and online courses used for programming and data science learning]
A collection of methods used when aggregating data with pandas
[Python] Join two tables with pandas
Extract specific multiple columns with pandas
Summary of methods for automatically determining thresholds
Summary of frequently used commands in matplotlib
Python + Selenium Frequently used operation method summary
Summary of various for statements in Python
[Python] Summary of array generation (initialization) time! !! !!
[Python2.7] Summary of how to use unittest
Summary of snippets when developing with Go
Pandas of the beginner, by the beginner, for the beginner [Python]
Recommendation of Altair! Data visualization with Python
Summary of useful techniques for Python Scrapy
Summary of how to use Python list
[Python2.7] Summary of how to use subprocess
Axis option specification summary of Python "numpy.sum (...)"