As a learning note, I've put together the code around pandas for my own reference instead of a cheat sheet.
1. Import pandas </ b>
import pandas as pd
2. How to create a DataFrame </ b> There are two methods, one is to create from a dictionary and the other is to read from a CSV file. ① How to create from a dictionary
import pandas as pd
dict = {"name":["Hokkaido","Tokyo","Aichi","Osaka"],
        "capital":["Sapporo","Shinzyuku","Nagoya","Osaka"],
        "area":[83424,2191,5172,1905],
        "population":[5286,13822,7537,8813]}
prefecture = pd.DataFrame(dict)
prefecture

② How to read from CSV file
import pandas as pd
#read_Use csv function
prefecture = pd.read_csv("path/to/prefecture.csv")
However, if nothing is done, the row label of the csv file will be recognized as a column by itself, so let us know that the first column contains the row index as follows.
import pandas as pd
#index_col=0 indicates that the column with 0 index is the row label
prefecture = pd.read_csv("path/to/prefecture.csv", index_col = 0)
3. Row label settings
prefecture.index = ["Hokkaido","Tokyo","Aichi","Osaka"]

4. Select data from DataFrame </ b> There are two methods, one is to use square brackets ("[]") and the other is to use access methods such as loc and iloc. ① Method using []
#Select only the name column
prefecture["name"]
 However, the data type extracted by this method is a data type called pandasSeries, not a DataFrame. To retrieve data as a DataFrame, double [] as shown below.
However, the data type extracted by this method is a data type called pandasSeries, not a DataFrame. To retrieve data as a DataFrame, double [] as shown below.
#Extract data while keeping the data type as DataFrame by duplicating the square brackets
prefecture[["name"]]

You can also retrieve multiple columns as follows:
#You can select multiple columns
prefecture[["name","capital"]]

Use slices to retrieve horizontal rows.
prefecture[1:3]

② Method using loc and iloc loc can select data based on label, iloc can select data based on position. If loc is described as Dataframe name.loc ["row label"] as shown below, the data in that row can be selected.
prefecture.loc["Tokyo"]
 However, with this method, the data type is not yet DataFrame, so if you want to select it as DataFrame, use double brackets.
However, with this method, the data type is not yet DataFrame, so if you want to select it as DataFrame, use double brackets.
prefecture.loc[["Tokyo"]]

You can select multiple lines by writing as follows.
prefecture.loc[["Tokyo","Aichi"]]

Furthermore, if you describe as follows, you can also specify the column, and you can select the data only at the intersection of the specified row and column.
prefecture.loc[["Tokyo","Aichi"],["name","capital"]]
 As with the list, just ":" means select all.
As with the list, just ":" means select all.
#":"Select all rows using
prefecture.loc[:,["name","capital"]]

iloc uses indexes instead of row labels in loc.
#The following two return exactly the same result.
prefecture.loc[["Tokyo"]]
prefecture.iloc[[1]]
#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"]]
prefecture.iloc[[1,2]]
#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"],["name","capital"]]
prefecture.iloc[[1,2],[0,1]]
#The following two return exactly the same result.
prefecture.loc[:,["name","capital"]]
prefecture.iloc[:,[0,1]]
Recommended Posts