DataFrame of pandas is good for handling structured data! (I actually read it in a data science book that I browsed at a bookstore). I would like to leave a memo as a memorandum for super beginners.
Python3.6.10 pandas-1.0.1 Jupyter notebook
There was nothing difficult ...
pip install pandas
Since the data I want to handle this time is in csv format, I will read the csv file. After a little research, I found that pandas has two data structures, Series and DataFrame, and it seems that Series corresponds to one-dimensional data and DataFrame corresponds to two-dimensional data (Since Series and DataFrame are not well understood). I wish I could study again and write another article)
For the time being, I would like to read the csv data as a DataFrame this time.
python
import pandas as pd
pd.read_csv('Data path',header = None)
I wanted to read a csv file with a comma delimiter, so I used read_csv (). If the delimiter is tab (\ t), read_table () can be used instead.
Also, since the heading line does not exist in the csv file you want to read, None is specified for header.
The data actually read was displayed on jupyter as shown in the figure below ↓
Format the read data as follows ・ The 0th column is unnecessary, so delete it. ・ Make the first column an index (heading column)
Cut off only the 0th column in the slice.
In Series and DataFrame, you can specify a character string in the index, or you can specify a (arbitrary) numerical value. In particular, to avoid confusion when specifying a numerical value for the index, access the data using index attributes such as iloc and loc.
This time (for now), the DataFrame index matches the Python-style index in both rows and columns, so you can get the same result with either iloc or loc. The 0th line of the data was cut off as follows (only the 1st and subsequent lines were extracted)
python
#load csv
df = pd.read_csv('Data/test231.csv',header = None)
#Slice
df.iloc[:,1:]
Actual output ↓
Compared with the previous image, the number of columns has changed from 170 to 169, and it can be confirmed that only the first column is reduced.
In pandas DataFrame, it seems that row headings are called indexes and column headings are called columns.
By doing the following, the column named "1" in the above figure could be specified as index.
python
#Slice
sliced_df = df.iloc[:,1:]
#name is"1"Specify the column of
sliced_df.set_index(1)
I was able to execute as below
Next, I would like to make an article as a memorandum about changing the name of columnn and processing this table as hierarchical data.
Recommended Posts