Scribble what I used when using ipython in formatting pos data

Introduction

I had to format the messy data with bytes, so it's a memo at that time The scribble of what I did in R about January is here. import スクリーンショット 2016-06-01 21.02.42.png Import of familiar data analysis tool

スクリーンショット 2016-06-01 21.04.00.png However, since I often handle data that spans many files, glob is also imported !! pandas

Read file

data = pd.read_csv("file name.csv")

--pd.read_ file format () supports various file formats --header = -1 can eliminate header! --names = ['1', '2', '3'] etc.

datas = glob.glob('*')

If you have a large number of files, this will give you all the files in the directory.

In the case of pandas, the called file will be a DataFrame type instead of a numpy array.

スクリーンショット 2016-06-01 21.25.35.png

Erase columns and rows

It is a stripping off of the disturbing part.

data.drop([1,2])
#Clear line
data.drop([1,2],axis=1)
#Erase columns

By doing this, you can erase the row and column.

スクリーンショット 2016-06-01 21.31.15.png

Join columns and rows

pd.concat([data[1],data[0]])
#Join lines
pd.concat([data[1],data[0]],axis=1)
#Join columns

This is useful when you have a lot of files and data!

スクリーンショット 2016-06-01 21.40.16.png

Only rows with the data you want

Make only the data you want!

data.query("1==2")

Now you can only have a column named 1 with a value of 2.

スクリーンショット 2016-06-01 21.47.03.png

Stick with the same data

Indispensable when molding files to derive data relationships!

pd.merge(data1, data, on='Column name')

This will stick the data together with the same column values.

スクリーンショット 2016-06-01 21.56.45.png

Finally

Patience is important for blunt data shaping! After that, you can do it at once by using for etc. It is easy to swap columns and rows well by creating an array of numpy and a file in the middle. Think about the data you want and work hard toward it. Thank you for reading my poor memo.

Recommended Posts

Scribble what I used when using ipython in formatting pos data
What I was asked when using Random Forest in practice
What I got into when using Tensorflow-gpu
[Question] What happens when I use% in python?
What I do when imitating embedded go in python
What I was addicted to when using Python tornado
Scripts that can be used when using bottle in Python
Problems when using Elasticsearch as a data source in Redash
When using optparse with iPython
What I learned in Python
I get a can't set attribute when using @property in python
What I stumbled upon when using CodeIgniter on a Linux server
Precautions when using pit in Python
What I stumbled upon using Airflow
When using regular expressions in Python
I compared using Dash and Streamlit in Docker environment using B league data