Memorandum @ Python OR Seminar: Pandas

pandas

>>> import pandas as pd

pandas says "[Introduction to Data Analysis with Python](http://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8B%E3%83%87%E3" % 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E2% 80% 95NumPy% E3% 80% 81pandas% E3% 82% 92% E4% BD% BF% E3% 81% A3% E3% 81% 9F% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 87% A6% E7% 90% 86-Wes-McKinney / dp / 4873116554) ”. If you want to study in detail, please study here. Pythonによるデータ分析入門

Data type

There are two types of data in pandas, ** Series ** and ** DataFrame **.

Series A type that handles one column (row) of data.

#Ordinary list type
>>> lst = [1, 2, 3, 4, 5]
# Series
>>> s = pd.Series(lst)
>>> s
0    1
1    2
2    3
3    4
4    5
dtype: float64

DataFrame A type that handles table data.

#Ordinary dictionary type
>>> dic = {"a": [1, 2, 3], "b": [9, 8, 7]}
# DataFrame
>>> df = pd.DataFrame(dic)
	a	b
0	1	9
1	2	8
2	3	7

I/O

File reading is provided.

>>> pd.read*? # *Partial match when attached
pd.read_clipboard
pd.read_csv
pd.read_excel
pd.read_fwf
pd.read_gbq
pd.read_hdf
pd.read_html
pd.read_json
pd.read_msgpack
pd.read_pickle
pd.read_sql
pd.read_sql_query
pd.read_sql_table
pd.read_stata
pd.read_table

csv file reading example

>>> dataset = pd.read('sample.csv')

csv file writing example

>>> dataset.to_csv('write.csv')

Read data

When you want to see a few lines of data

>>> dataset.head() #If you put integer in the argument
>>> dataset.tail() #Read integer line

>>> dataset.ix[n] #See line n
>>> dataset.ix[m:n] # m~(n-1)See line
>>> dataset.ix[[0, 3, 5]] #See a distant line

When you want to see a sequence of data

>>> dataset.Column name
#Or
>>> dataset['Column name']
#When you want to see multiple columns
>>> dataset[['Column name 1', 'Column name 2', ...,]]

Search

>>> dataset[dataset.UID == 'Column name']

Handling of missing values NA

Check for missing values

It seems to combine len () and count ()

--len () gets the size of the data. --count () gets the number of elements other than NA in the column direction. (If the argument is ʻaxis = 1`, it will be in the row direction)

>>> len(dataset) - dataset.count()
UID             0
dtime           0
Sousyouhi       0
Hatsudenryou    0
Jikasyouhi      0
Uriden          0
Kaiden          0
Use_AirCon      0
Use_Kyutou      3
Use_Kaden       0
dtype: int64

If you want to take a closer look, you should combine ʻis null () and ʻany (). ʻIsnull (): Set the element NA to True and the others to False. ʻAny () : Returns True if there is even one True in the column direction, False if there is none. (If the argument is ʻaxis = 1`, it will be in the row direction)

>>> dataset[dataset.isnull().any(axis=1)]

Handling of missing values NA

Mainly two. dropna (): Delete the line containing NA. (Column direction with ʻaxis = 1as an argument) fillna ('something'): Replace NA with something`.

>>> dataset.dropna()

>>> dataset.fillna(0) #Replace NA with 0

#Fill NA with previous value front?
>>> dataset.fillna(method='ffill')
#Fill NA with back value back?
>>> dataset.fillna(method='bfill')

Summary statistics

If you use describe (), it will calculate most of the things. count, mean, std, min, 25%, 50%, 75%, max

>>> dataset.describe()

Grouping

Group by element.

>>> dataset.groupby('Column name')

Graph

It seems that you can draw some graphs with just pandas.

Time series graph

Set dtime to index

>>> tdataset = dataset.copy()
>>> tdataset.index = tdatasset.dtime.apply(pd.to_datetime)
>>> tdataset.drop('dtime', axis=1, inplace=True)
>>> b = tdataset[tdataset.UID == 'id1'] \
...                      [['UID', 'Soushohi']]
>>> b.plot()

Resampling

If it is left as it is (every 2 hours) as above, the graph is too fine. Every other day.

>>> c = b.resample('1d') # 1m:Every other month
>>> c.plot()
>>> b.resample('1d', 'std').plot() #standard deviation
>>> b.drop('UID', axis=1).resample('1d', 'max').plot() #Maximum value

moving average

>>> pd.rolling_mean(c, 12).plot() #12 weeks

histogram

>>> c.hist()

Box plot

>>> c.boxplot(return_type='axes')

Correlation coefficient

>>> c.corr()

Correlogram

>>> import statsmodels.api as sm
>>> plot(sm.tsa.acf(b.Column name))

Scatter plot

>>> pd.tools.plotting.scatter_matrix(c)

Multiple regression

>>> c = c.fillna(0)
>>> m = sm.OLS(c.Soushoshouhi, \
...          c[['Hatsudenryou', 'Use_AirCon']])
>>> r = m.fit()
>>> r.summary2()

Online documentation

The rest is here.

Recommended Posts

Memorandum @ Python OR Seminar: Pandas
Memorandum @ Python OR Seminar
Memorandum @ Python OR Seminar: matplotlib
Memorandum @ Python OR Seminar: Pulp
Memorandum @ Python OR Seminar: scikit-learn
Python memorandum
python memorandum
Pandas memorandum
Python memorandum
pandas memorandum
python memorandum
Python memorandum
Python basics memorandum
My pandas (python)
Python pathlib memorandum
Python memorandum (algorithm)
Pandas operation memorandum
Python memorandum [links]
python pandas notes
[For recording] Pandas memorandum
Python> list> extend () or + =
python memorandum (sequential update)
Python from or import
python autotest or sniffer
Python memorandum (personal bookmark)
Installing pandas on python2.6
Python basic memorandum part 2
[Python] Iterative processing_Personal memorandum
python memorandum super basic
Python application: Pandas # 3: Dataframe
Python Basic --Pandas, Numpy-
Effective Python Learning Memorandum Day 15 [15/100]
Read csv with python pandas
Python application: Pandas Part 1: Basic
Cisco Memorandum _ Python config input
Python application: Pandas Part 2: Series
[Python] Convert list to Pandas [Pandas]
Python 3.4 or later standard pip
Python pandas strip header space
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
ABC memorandum [ABC163 C --managementr] (Python)
About python beginner's memorandum function
Memorandum (pseudo Vlookup by pandas)
[Python] SQLAlchemy error avoidance memorandum
[Python] Change dtype with pandas
A memorandum about correlation [Python]
Effective Python Learning Memorandum Day 1 [1/100]
python pandas study recent summary
Python bitwise operator and OR
Effective Python Learning Memorandum Day 13 [13/100]
A memorandum about Python mock
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
100 Pandas knocks for Python beginners
[python] Random number generation memorandum
Effective Python Learning Memorandum Day 4 [4/100]
Ruby's `` like Python. 2.6 or later
Data analysis using python pandas
Python or and and operator trap