[PYTHON] pandas memo

As a learning note, I've put together the code around pandas for my own reference instead of a cheat sheet.

1. Import pandas </ b>

import pandas as pd

2. How to create a DataFrame </ b> There are two methods, one is to create from a dictionary and the other is to read from a CSV file. ① How to create from a dictionary

import pandas as pd
dict = {"name":["Hokkaido","Tokyo","Aichi","Osaka"],
        "capital":["Sapporo","Shinzyuku","Nagoya","Osaka"],
        "area":[83424,2191,5172,1905],
        "population":[5286,13822,7537,8813]}
prefecture = pd.DataFrame(dict)
prefecture

image.png

② How to read from CSV file

import pandas as pd
#read_Use csv function
prefecture = pd.read_csv("path/to/prefecture.csv")

However, if nothing is done, the row label of the csv file will be recognized as a column by itself, so let us know that the first column contains the row index as follows.

import pandas as pd
#index_col=0 indicates that the column with 0 index is the row label
prefecture = pd.read_csv("path/to/prefecture.csv", index_col = 0)

3. Row label settings

prefecture.index = ["Hokkaido","Tokyo","Aichi","Osaka"]

image.png

4. Select data from DataFrame </ b> There are two methods, one is to use square brackets ("[]") and the other is to use access methods such as loc and iloc. ① Method using []

#Select only the name column
prefecture["name"]

image.png However, the data type extracted by this method is a data type called pandasSeries, not a DataFrame. To retrieve data as a DataFrame, double [] as shown below.

#Extract data while keeping the data type as DataFrame by duplicating the square brackets
prefecture[["name"]]

image.png

You can also retrieve multiple columns as follows:

#You can select multiple columns
prefecture[["name","capital"]]

image.png

Use slices to retrieve horizontal rows.

prefecture[1:3]

image.png

② Method using loc and iloc loc can select data based on label, iloc can select data based on position. If loc is described as Dataframe name.loc ["row label"] as shown below, the data in that row can be selected.

prefecture.loc["Tokyo"]

image.png However, with this method, the data type is not yet DataFrame, so if you want to select it as DataFrame, use double brackets.

prefecture.loc[["Tokyo"]]

image.png

You can select multiple lines by writing as follows.

prefecture.loc[["Tokyo","Aichi"]]

image.png

Furthermore, if you describe as follows, you can also specify the column, and you can select the data only at the intersection of the specified row and column.

prefecture.loc[["Tokyo","Aichi"],["name","capital"]]

image.png As with the list, just ":" means select all.

#":"Select all rows using
prefecture.loc[:,["name","capital"]]

image.png

iloc uses indexes instead of row labels in loc.

#The following two return exactly the same result.
prefecture.loc[["Tokyo"]]
prefecture.iloc[[1]]

#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"]]
prefecture.iloc[[1,2]]

#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"],["name","capital"]]
prefecture.iloc[[1,2],[0,1]]

#The following two return exactly the same result.
prefecture.loc[:,["name","capital"]]
prefecture.iloc[:,[0,1]]

Recommended Posts