As a learning note, I've put together the code around pandas for my own reference instead of a cheat sheet.
1. Import pandas </ b>
import pandas as pd
2. How to create a DataFrame </ b> There are two methods, one is to create from a dictionary and the other is to read from a CSV file. ① How to create from a dictionary
import pandas as pd
dict = {"name":["Hokkaido","Tokyo","Aichi","Osaka"],
"capital":["Sapporo","Shinzyuku","Nagoya","Osaka"],
"area":[83424,2191,5172,1905],
"population":[5286,13822,7537,8813]}
prefecture = pd.DataFrame(dict)
prefecture
② How to read from CSV file
import pandas as pd
#read_Use csv function
prefecture = pd.read_csv("path/to/prefecture.csv")
However, if nothing is done, the row label of the csv file will be recognized as a column by itself, so let us know that the first column contains the row index as follows.
import pandas as pd
#index_col=0 indicates that the column with 0 index is the row label
prefecture = pd.read_csv("path/to/prefecture.csv", index_col = 0)
3. Row label settings
prefecture.index = ["Hokkaido","Tokyo","Aichi","Osaka"]
4. Select data from DataFrame </ b> There are two methods, one is to use square brackets ("[]") and the other is to use access methods such as loc and iloc. ① Method using []
#Select only the name column
prefecture["name"]
However, the data type extracted by this method is a data type called pandasSeries, not a DataFrame. To retrieve data as a DataFrame, double [] as shown below.
#Extract data while keeping the data type as DataFrame by duplicating the square brackets
prefecture[["name"]]
You can also retrieve multiple columns as follows:
#You can select multiple columns
prefecture[["name","capital"]]
Use slices to retrieve horizontal rows.
prefecture[1:3]
② Method using loc and iloc loc can select data based on label, iloc can select data based on position. If loc is described as Dataframe name.loc ["row label"] as shown below, the data in that row can be selected.
prefecture.loc["Tokyo"]
However, with this method, the data type is not yet DataFrame, so if you want to select it as DataFrame, use double brackets.
prefecture.loc[["Tokyo"]]
You can select multiple lines by writing as follows.
prefecture.loc[["Tokyo","Aichi"]]
Furthermore, if you describe as follows, you can also specify the column, and you can select the data only at the intersection of the specified row and column.
prefecture.loc[["Tokyo","Aichi"],["name","capital"]]
As with the list, just ":" means select all.
#":"Select all rows using
prefecture.loc[:,["name","capital"]]
iloc uses indexes instead of row labels in loc.
#The following two return exactly the same result.
prefecture.loc[["Tokyo"]]
prefecture.iloc[[1]]
#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"]]
prefecture.iloc[[1,2]]
#The following two return exactly the same result.
prefecture.loc[["Tokyo","Aichi"],["name","capital"]]
prefecture.iloc[[1,2],[0,1]]
#The following two return exactly the same result.
prefecture.loc[:,["name","capital"]]
prefecture.iloc[:,[0,1]]
Recommended Posts