Introduction

DataFrame of pandas is good for handling structured data! (I actually read it in a data science book that I browsed at a bookstore). I would like to leave a memo as a memorandum for super beginners.

environment

Python3.6.10 pandas-1.0.1 Jupyter notebook

First install

There was nothing difficult ...

pip install pandas

Read csv file

Since the data I want to handle this time is in csv format, I will read the csv file. After a little research, I found that pandas has two data structures, Series and DataFrame, and it seems that Series corresponds to one-dimensional data and DataFrame corresponds to two-dimensional data (Since Series and DataFrame are not well understood). I wish I could study again and write another article)

For the time being, I would like to read the csv data as a DataFrame this time.

`python`


import pandas as pd
pd.read_csv('Data path',header = None)

I wanted to read a csv file with a comma delimiter, so I used read_csv (). If the delimiter is tab (\ t), read_table () can be used instead.

Also, since the heading line does not exist in the csv file you want to read, None is specified for header.

The data actually read was displayed on jupyter as shown in the figure below ↓ スクリーンショット 2020-02-27 13.34.31.png

Data shaping

Format the read data as follows ・ The 0th column is unnecessary, so delete it. ・ Make the first column an index (heading column)

34.31.png

Delete column

Cut off only the 0th column in the slice.

In Series and DataFrame, you can specify a character string in the index, or you can specify a (arbitrary) numerical value. In particular, to avoid confusion when specifying a numerical value for the index, access the data using index attributes such as iloc and loc. スクリーンショット 2020-02-27 14.31.55.png

This time (for now), the DataFrame index matches the Python-style index in both rows and columns, so you can get the same result with either iloc or loc. The 0th line of the data was cut off as follows (only the 1st and subsequent lines were extracted)

`python`


#load csv
df = pd.read_csv('Data/test231.csv',header = None)
#Slice
df.iloc[:,1:]

Actual output ↓ Screenshot 2020-02-27 14.36.46.png

Compared with the previous image, the number of columns has changed from 170 to 169, and it can be confirmed that only the first column is reduced.

Make the first column an index

In pandas DataFrame, it seems that row headings are called indexes and column headings are called columns. 6.46.png

By doing the following, the column named "1" in the above figure could be specified as index.

`python`



#Slice
sliced_df = df.iloc[:,1:]
#name is"1"Specify the column of
sliced_df.set_index(1)

I was able to execute as below

in conclusion

Next, I would like to make an article as a memorandum about changing the name of columnn and processing this table as hierarchical data.

[PYTHON] Load csv with pandas and play with Index

Introduction

environment

First install

Read csv file

python

Data shaping

Delete column

python

Make the first column an index

python

in conclusion

`python`

`python`

`python`