[PYTHON] Pandas memo

If there is any Pandas related content in Python learning, I will update it from time to time.

Pandas A library that provides functions to support data analysis


import

python


import pandas as pd

Data capture

Read CSV [read_csv]

python


csv_test_1 = pd.read_csv('hoge.csv')
Read Excel [read_excel]

python


excel_data = pd.read_excel('hoge.xlsx')

Data join (union)

Vertical combination of data [concat]

python


csv_test_2 = pd.read_csv('hoge_2.csv')
csv_test = pd.concat([csv_test_1 , csv_test_2], ignore_index=True)
csv_test.head()
Data merge LEFT JOIN [merge]

-When the item names of both tables to be joined are the same. Combine with ```on =" id "` `` as a condition.

Post-join table= pd.merge(Table 1,Table 2, on="Join item", how="Method")




#### **`python`**
```python

join_data = pd.merge(a_data, b_data[["id", "date", "customer"]], on="id", how="left")
join_data.head()

-When the item names of both tables to be joined are different. Combined with `left_on =" customer_name ", right_on =" customer name "" `.

python


pd.merge(a_data, b_data, left_on="customer_name", right_on="Customer name", how="left")

Data confirmation

Acquisition of unique data [pd.unique (data)]

python


pd.unique(test_data.item_name))
len(pd.unique(test_data.item_name))) #Number of unique data

Date manipulation

Convert the value in column a to datetime type [to_datetime ()]

python


test_data["a"] = pd.to_datetime(test_data["a"])
Extraction of date [dt]
Date format [dt.strftime ("% Y% m")]

python


time_data["payment_month"] = time_data["payment_date"].dt.strftime("%Y%m")

Pivot table

Create a pivot table [pd.pivot_table]

python


pd.pivot_table(test_data, index='item_name', columns='payment_month', values=['price', 'quantity'], aggfunc='sum')

** ・ Pivot_table overview ** index: Specify a row columns: Specify columns values: Specify the values to be aggregated aggfunc: Specify the aggregation method


It's not the content of Pandas, so I'll organize it later.

Data display

Display [print]

python


print(len(test_data))  #Display the number of data
Display the first 5 lines of data [head]

python


csv_test_1.head()
Specify the data column and display the first 5 rows [head]

python


csv_test_1["Column name"].head()

Manipulating data

Extract data with .loc function [.loc (condition, column to be acquired)]

python


res = test_data.loc[flg_is_null, "item_name"]

Creating a data column

Set the value obtained by multiplying a and b to new in the additional column.

python


test_data["new"] = test_data["a"] * test_data["b"]

Data calculation

Sum up column a [column.sum ()]

python


test_data["a"].sum()
Aggregate by specified group [groupby ("column"). Sum ("column")]

python


test_data.groupby("create_date").sum()["price"]
Aggregate by specified group (multiple specifications) [groupby ("column"). Sum ("column")]

python


test_data.groupby(["create_date", "item_name"]).sum()[["price", "quantity"]]

Data comparison

Compare the total in column a with the total in column b and display the result in TRUE / FALSE

python


test_data["a"].sum() == test_data["b"].sum()
Check for missing values, return null for each column as TRUE / FALSE, and sum with sum

python


test_data.isnull().sum()
Confirmation of missing values Returns the presence or absence of missing values in TRUE / FALSE for each column

python


test_data.isnull().any(axis=0)
Output of various statistics [describe ()]

python


test_data.describe()
Maximum and minimum values of the specified column [max (), min ()]

python


test_data["create_date"].min()
test_data["create_date"].max()
Data type confirmation [dtypes]

python


test_data.dtypes

-The following various statistics can be displayed with describe (). Number of data (count), mean (mean), standard deviation (std), minimum (min), quartile (25%, 75%), median (50%), maximum (max)


Work memo ・ Data cleansing

Data processing: Pandas Visualization: Matplotlib Machine learning: scikit-learn

Recommended Posts

Pandas memo
pandas memo
Pandas reverse lookup memo
Pandas
Visualization memo by pandas, seaborn
gzip memo
Raspberry-pi memo
[Python] Operation memo of pandas DataFrame
HackerRank memo
Python memo
python memo
graphene memo
Flask memo
pyenv memo
Matplotlib memo
Pandas memo ~ None, np.nan, empty string ~
pytest memo
sed memo
Python memo
Install Memo
Pandas basics
BeautifulSoup4 memo
Pandas notes
networkx memo
python memo
tomcat memo
[Memo] Small story of pandas, numpy
command memo
Generator memo.
Pandas memorandum
psycopg2 memo
Python memo
SSH memo
Pandas basics
Command memo
Memo: rtl8812
pandas memorandum
Shell memo
pandas SettingWithCopyWarning
Python memo
Pycharm memo
Python memo
pandas self-study notes
AtCoder devotion memo (11/12)
[OpenCV] Personal memo
[Python] Memo dictionary
PyPI push memo
tensorflow-gpu introduction memo
LPIC201 learning memo
Jupyter Notebook memo
LPIC304 virtualization memo
ALDA execution memo
My pandas (python)
python beginner memo (9.2-10)
youtube download memo
Linux x memo
Django Learning Memo
ARC # 016 Participation memo
Beautiful Soup memo
LPIC101 study memo
python beginner memo (9.1)