[PYTHON] Load csv with pandas and play with Index

Introduction

DataFrame of pandas is good for handling structured data! (I actually read it in a data science book that I browsed at a bookstore). I would like to leave a memo as a memorandum for super beginners.

environment

Python3.6.10 pandas-1.0.1 Jupyter notebook

First install

There was nothing difficult ...

pip install pandas

Read csv file

Since the data I want to handle this time is in csv format, I will read the csv file. After a little research, I found that pandas has two data structures, Series and DataFrame, and it seems that Series corresponds to one-dimensional data and DataFrame corresponds to two-dimensional data (Since Series and DataFrame are not well understood). I wish I could study again and write another article)

For the time being, I would like to read the csv data as a DataFrame this time.

python


import pandas as pd
pd.read_csv('Data path',header = None)

I wanted to read a csv file with a comma delimiter, so I used read_csv (). If the delimiter is tab (\ t), read_table () can be used instead.

Also, since the heading line does not exist in the csv file you want to read, None is specified for header.

The data actually read was displayed on jupyter as shown in the figure below ↓ スクリーンショット 2020-02-27 13.34.31.png

Data shaping

Format the read data as follows ・ The 0th column is unnecessary, so delete it. ・ Make the first column an index (heading column)

34.31.png

Delete column

Cut off only the 0th column in the slice.

In Series and DataFrame, you can specify a character string in the index, or you can specify a (arbitrary) numerical value. In particular, to avoid confusion when specifying a numerical value for the index, access the data using index attributes such as iloc and loc. スクリーンショット 2020-02-27 14.31.55.png

This time (for now), the DataFrame index matches the Python-style index in both rows and columns, so you can get the same result with either iloc or loc. The 0th line of the data was cut off as follows (only the 1st and subsequent lines were extracted)

python


#load csv
df = pd.read_csv('Data/test231.csv',header = None)
#Slice
df.iloc[:,1:]

Actual output ↓ Screenshot 2020-02-27 14.36.46.png

Compared with the previous image, the number of columns has changed from 170 to 169, and it can be confirmed that only the first column is reduced.

Make the first column an index

In pandas DataFrame, it seems that row headings are called indexes and column headings are called columns. 6.46.png

By doing the following, the column named "1" in the above figure could be specified as index.

python



#Slice
sliced_df = df.iloc[:,1:]
#name is"1"Specify the column of
sliced_df.set_index(1)

I was able to execute as below

スクリーンショット 2020-02-27 15.26.28.png

in conclusion

Next, I would like to make an article as a memorandum about changing the name of columnn and processing this table as hierarchical data.

Recommended Posts

Load csv with pandas and play with Index
Load csv with duplicate columns in pandas
Read CSV and analyze with Pandas and Seaborn
pandas index and reindex
Read csv with python pandas
Load nested json with pandas
[Memo] Load csv of s3 into pandas with boto3
Reading and writing CSV with Python
Import of japandas with pandas 1.0 and above
Fractal to make and play with Python
Play with MoleculeNet's PDBBind and DeepChem's RDKitGridFeaturizer
[pandas] .csv file reading and display method
Grouping csv and getting minimum value (pandas)
Play with Prophet
Load caffe model with Chainer and classify images
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Play with PyTorch
Play with 2016-Python
Understand grid points and play with contour lines.
Implement "Data Visualization Design # 3" with pandas and matplotlib
Install pip and pandas with Ubuntu or VScode
Analyze Apache access logs with Pandas and Matplotlib
Interactively visualize data with TreasureData, Pandas and Jupyter.
Python hand play (interoperability between CSV and PostgreSQL)
Play with CentOS 8
Example of reading and writing CSV with Python
Play with Fathom
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
When reading a csv file with read_csv of pandas, the first column becomes index
Procedure to load MNIST with python and output to png
How to extract null values and non-null values with pandas
How to output CSV of multi-line header with pandas
How to convert JSON file to CSV file with Python Pandas
How to loop and play gif video with openCV
[Python] A memo to write CSV vertically with Pandas
[Linux] [Python] [Pandas] Load Microsoft Access database (* .mdb) with Pandas
Process csv data with python (count processing using pandas)
Extract the maximum value with pandas and change that value
[How to!] Learn and play Super Mario with Tensorflow !!
Quickly visualize with Pandas
Bootstrap sampling with Pandas
Processing datasets with pandas (2)
Merge datasets with pandas
jupyter and pandas installation
Learn Pandas with Cheminformatics
Play with Othello (Reversi)
Read CSV file: pandas
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Pandas averaging and listing
Csv tinkering with python
With and without WSGI
LOAD DATA with PyMysql
Create a new csv with pandas based on the local csv
[Python] How to play with class variables with decorator and metaclass
[Let's play with Python] Image processing to monochrome and dots
BASIC and C and assembler speed comparison and optimization play with IchigoJam
Play with Mastodon's archive in Python 2 Count replies and favourites
How to create dataframes and mess with elements in pandas
Play with the password mechanism of GitHub Webhook and Python