[PYTHON] pandas memo

As a learning note, I've put together the code around pandas for my own reference instead of a cheat sheet.

1. Import pandas </ b>

import pandas as pd

2. How to create a DataFrame </ b> There are two methods, one is to create from a dictionary and the other is to read from a CSV file. ① How to create from a dictionary

import pandas as pd dict = {"name":["Hokkaido","Tokyo","Aichi","Osaka"], "capital":["Sapporo","Shinzyuku","Nagoya","Osaka"], "area":[83424,2191,5172,1905], "population":[5286,13822,7537,8813]} prefecture = pd.DataFrame(dict) prefecture

② How to read from CSV file

import pandas as pd #read_Use csv function prefecture = pd.read_csv("path/to/prefecture.csv")

However, if nothing is done, the row label of the csv file will be recognized as a column by itself, so let us know that the first column contains the row index as follows.

import pandas as pd #index_col=0 indicates that the column with 0 index is the row label prefecture = pd.read_csv("path/to/prefecture.csv", index_col = 0)

3. Row label settings

prefecture.index = ["Hokkaido","Tokyo","Aichi","Osaka"]

4. Select data from DataFrame </ b> There are two methods, one is to use square brackets ("[]") and the other is to use access methods such as loc and iloc. ① Method using []

#Select only the name column prefecture["name"]

However, the data type extracted by this method is a data type called pandasSeries, not a DataFrame. To retrieve data as a DataFrame, double [] as shown below.

#Extract data while keeping the data type as DataFrame by duplicating the square brackets prefecture[["name"]]

You can also retrieve multiple columns as follows:

#You can select multiple columns prefecture[["name","capital"]]

Use slices to retrieve horizontal rows.

prefecture[1:3]

② Method using loc and iloc loc can select data based on label, iloc can select data based on position. If loc is described as Dataframe name.loc ["row label"] as shown below, the data in that row can be selected.

prefecture.loc["Tokyo"]

However, with this method, the data type is not yet DataFrame, so if you want to select it as DataFrame, use double brackets.

prefecture.loc[["Tokyo"]]

You can select multiple lines by writing as follows.

prefecture.loc[["Tokyo","Aichi"]]

Furthermore, if you describe as follows, you can also specify the column, and you can select the data only at the intersection of the specified row and column.

prefecture.loc[["Tokyo","Aichi"],["name","capital"]]

As with the list, just ":" means select all.

#":"Select all rows using prefecture.loc[:,["name","capital"]]

iloc uses indexes instead of row labels in loc.

#The following two return exactly the same result. prefecture.loc[["Tokyo"]] prefecture.iloc[[1]] #The following two return exactly the same result. prefecture.loc[["Tokyo","Aichi"]] prefecture.iloc[[1,2]] #The following two return exactly the same result. prefecture.loc[["Tokyo","Aichi"],["name","capital"]] prefecture.iloc[[1,2],[0,1]] #The following two return exactly the same result. prefecture.loc[:,["name","capital"]] prefecture.iloc[:,[0,1]]

Recommended Posts
pandas memo

pandas memo

Visualization memo by pandas, seaborn

python memo

tomcat memo

[Memo] Small story of pandas, numpy

command memo

Generator memo.

Pandas memorandum

psycopg2 memo

Python memo

SSH memo

Pandas basics

Memo: rtl8812

pandas memorandum

Shell memo

pandas SettingWithCopyWarning

Python memo

Pycharm memo

Flask basic memo

Linux # Command Memo 1

Missing value pandas

9rep --Pandas MySQL

★ Memo ★ Python Iroha

Gender recognition memo

Image reading memo

[MEMO] [TERMINAL] Alacritty

3D rotation memo (1)

Python 3 operator memo

H2O.ai Introduction memo

lambda expression memo

[Memo] [terminal] xfce-terminal

Pandas operation memorandum

Jupyter Notebook memo

[Memo] Machine learning

Kivy-Android Memo (MAC)

[My memo] python

Python3 metaclass memo

Sort by pandas

[Python] Basemap memo

Recursive expression memo

Memo: Gradient Boost

Bash script memo

Python beginner memo (2)

Owner change (memo)

python pandas notes

[memo] Examine errno

[Python] Numpy memo

TensorFlow API memo

pandas series part 1

[Note] pandas unstack

[Xlsxwriter] Create conditional formatting Excel sheet with pandas + xlsxwriter [pandas] Memo

[Python] A memo to write CSV vertically with Pandas

[Memo] Text matching in pandas data frame using flashtext

[Memo] Load csv of s3 into pandas with boto3