Basic operation of Python Pandas Series and Dataframe (1)

When analyzing data with python, it is common (likely) to use a module called pandas.

In pandas, data can be stored in types called Series and Dataframe. Series is used to store one-dimensional data and Dataframe is used to store two-dimensional data. They are like high-performance one-dimensional arrays and two-dimensional arrays, respectively. High-performance means that each row and column can be named, and many methods are available.

# Hachiman Yukino Yui
Math 8 90 10
National language 88 100 50
English 38 95 35

When expressing this in a two-dimensional array, it is difficult to handle characters such as "Hachiman", "Yukino", "Yui", "Mathematics", "Kokugo", and "English". In Dataframe, this can be represented by columns and index.

However, this type has various troublesome specifications, and I stumbled from the beginning. This is a super rudimentary pandas operation manual that I made for myself as a pandas super beginner. The python version is 3.5.2 (I'm using Anacondan 4.2.0 instead of standard python) The version of pandas is 0.18.1 The code imagines a situation running on iPython 5.1.0.

Preparation before using pandas

Install pandas

I put it all at once in Anaconda. (Anaconda is like python + a popular library, including Numpy and iPython) Besides that, you can put it in with pip or something.

Import pandas

Since pandas is a module, it must be imported. In[1]: import pandas However, no matter where you look at the reference sites, pandas seems to be loaded under the name pd, so I will follow that here as well. In[2]: import pandas as pd

Series operation

Make a Series

# Hachiman
Math 8
National language 88
English 38

For example, suppose you are given such a one-dimensional array. The first thing that comes to mind when you see this is to create a List.

HachimanList = [8, 88, 38]



 It's easy to access the elements of this.
 If you want a national language score,
 ʻIn [4]: Hachiman [1]`
 ʻOut [4]: 88` will be returned.

 The problem with this is that information such as "math" scores is missing.
 Of course, you can create dictionaries, objects, and named tuples, but none of them are suitable for large-scale data processing.

 The solution to this is the pandas Series.

In[5]: HachimanSeries = pd.Series(HachimanList, index = ["math", "japanese", "english"])


 in this way,

variable= pd.Series(Array of data, index =Array of rampant names)


 Can be specified with.
 When I try to output this,

In[6]: HachimanSeries Out[6]: math 8 japanese 88 english 38 dtype: int64`


 You can see that each item is given a name and output.
 Note that `dtype` is the data type of the entire array. (In Numpy, integer types are assigned to several different integer types, int64 is one of them)

 What if you didn't specify an index? Will an error be returned?

In[7]: YukinoSeries = pd.Series([90, 100, 95]) In[8]: YukinoSeries Out[8]: 0 90 1 100 2 95 dtype: int64

 Apparently, ʻindex` has a default value that increases from 0.
 Note that ʻindex` can be added later.

In[9]: YukinoSeries.index = ["math", "japanese", "english"] In[10]: YukinoSeries Out[10]: math 90 japanese 100 english 95 dtype: int64


 There is also a method using a dictionary.

In[11]: YuiSeries = pd.Series({"math":10, "japanese":50, "english":35}) In[12]: YuiSeries Out[12]: english 35 japanese 50 math 10 dtype: int64

 In this case, it is unavoidable that the order will be out of order.

## Extract the value of Series
 Retrieving the values in the Series is almost the same as a normal array.
#### Element specification
 The nth element of the array is taken out as `array [n-1]`.
 Similarly, the nth element of Series is

In[13]: HachimanSeries[2] Out[14]: 38

 You can also pass this to a variable for calculation.

In[15]: HachimanMath = HachimanSeries[0] In[16]: 40 <= HachimanMath Out[16]: False


 However, the type of HachimanMath is `numpy.int64` instead of the usual ʻint`.


In[17]: type(HachimanMath) Out[17]: numpy.int64


 You can also use the `array [-1]`.
 It's easy to retrieve multiple variables.

In[18]: HachimanSeries[0:2] Out[18]: [8, 38]


 Readers who have run the sample code in their own environment without taking my results for granted should have noticed that I finally revealed my horse hoof and wrote a lie here.

 This code `HachimanSeries [0: 2]` returns the execution result of Series type because it looks like `pandas.core.series.Series`.

In[18]: HachimanSeries[0:2] Out[18]: math 8 japanese 88 dtype: int64

In[19]: type(HachimanSeries[0:2]) Out[19]: pandas.core.series.Series


 Summary

 --If you specify an element alone, you will get an execution result like ʻint` type called` numpy.int64`.
 --If you specify an element in a range, you will get a `Series` type called` pandas.core.series.Series`.

 Some people may find it a little unpleasant, but if you think about it carefully

- `int`->`numpy.int64`
- `list` ->`pandas.core.series.Series`

 It is the same as a normal array operation, just because there is a correspondence.
 So, of course, a `Series` type with a single element can be retrieved in the same way it retrieves a single array from an array.

In[20]: HachimanSeries[1:1+1] Out[20]: japanese 88 dtype: int64

In[21]: type(HachimanSeries[1:1+1]) Out[21]: pandas.core.series.Series


 To specify an element, you can also specify it by ʻindex` name like a dictionary type.

In[22]: HachimanSeries["math"] Out[22]: 8


 By the way, if you use `array [:: -1]`, the reverse result will be returned.

 Surprisingly, this can even be ranged using the ʻindex` name.

In[23]: HachimanSeries["math":"english"] Out[23]: math 8 japanese 88 english 38 dtype: int64


 This is something that can't be done with collections.OrderedDict as well as ordinary dictionaries, showing the high performance of Series-tan.

 If you want to access the name of ʻindex`, treat` Series.index` like an array.


In[24]:HachimanSeries.index Out[24]:Index(['math', 'japanese', 'english'], dtype='object')

In[25]: HachimanSeries.index[1] Out[25]: 'japanese'

In[26]: HachimanSeries.index.[1:2] Out[26]: Index(['japanese'], dtype='object')



 So far, we have explained that Series can retrieve elements like ordinary arrays and dictionaries.
#### Get the Series you want
 What if you want to pick up data in `Series`?
 In other words, you want only `math` and` japanese`, or you want only `math` and ʻenglish`.
 Or maybe you want `math` in two places.
 (Do you feel such a need now ...)

 For `math` and` japanese`, `HachimanSeries [0: 2]` will do the trick. However, when it comes to `math` and ʻenglish`, it's quite annoying.
 I come up with it there.

In[27]: HachimanSeries["math"]+HachimanSeries["english"]

 How about this! !!

Out[27]: 46


 The reality is ruthless, but this output nods. In the first place, the result of `HachimanSeries ["math "]` is `numpy.int64`.

 If so,

In[28]: HachimanSeries[0:0 + 1] + HachimanSeries[2:2 + 1]


 Try.

Out[28]: english NaN math NaN dtype: float64


 As you can see, it spewed out industrial waste.
 This is probably because the addition in `Series` is" adding the same indexes ".
 And for the elements that are not common, fill in `NaN` for the time being.
 In fact

In[29]: HachimanSeries[0:0 + 2] + YukinoSeries[1:1 + 2] Out[29]: english NaN japanese 188.0 math NaN dtype: float64

 Will be.

 So how do you favor only `math` and ʻenlish` in the same Series?
 The answer is to write a double `[]`.

In[30]: HachimanSeries[[0, 2]] Out[30]: math 8 english 38 dtype: int64


 Maybe this `[[]]` has nothing to do with the notation of the quadratic array. It seems that I just wanted to use the notation `[[]]`.
 (`HachimanSeries [(0,2)]` doesn't pass, so it didn't have to be something like an iterator, but `HachimanSeries [list ((0,2))]` passes, so It is considered to be the same as an array in terms of processing.)

 If you just want to emphasize your math score

In[31]: HachimanSeries[["math","math","math","math","math"]] Out[31]: math 8 math 8 math 8 math 8 math 8 dtype: int64


 You can do it. (Here, I specified the ʻindex` name directly)
 The same is true for ʻindex`.

In [32]: HachimanSeries.index[[1,2]] Out[32]: Index(['japanese', 'english'], dtype='object')


 So far, I've learned how to use `[[]]` to create a new Series that extracts only the desired ʻindex`.
### Rewriting elements of Series
 You may later find that the contents of the `Series` and the ʻindex` name were incorrect.
 There is a way to overwrite the modified version `Series` with the same name, but in fact it can be changed as easily as an array.

 First of all, the code to rewrite only one.

In[33]:HachimanSeries[1] Out[33]: 88

In[34]: HachimanSeries[1] = 98

In[35]: HachimanSeries[1] Out[35]: 98



 Then rewrite the specified range

In[36]: HachimanSeries Out[36]: math 8 japanese 98 english 38 dtype: int64

In[37]: HachimanSeries[1:1+2] = [89,33]

In[38]: HachimanSeries Out[38]: math 8 japanese 89 english 33 dtype: int64


 Here, I get angry if there are no numbers on the left and right sides.


In[39]: HachimanSeries[1:1+1] = [88,38]

ValueError (Omitted) ValueError: cannot set using a slice indexer with a different length than the value


 However, they can be aligned to the same value.

In[40]: HachimanSeries[0:0+3] = 0

In[41]: HachimanSeries Out[40]: math 0 japanese 0 english 0 dtype: int64


 Finally, rewriting ʻindex`

In[42]: HachimanSeries.index[1] = "Japanese" Out[42]: HachimanSeries math 0 Japanese 0 english 0 dtype: int64


 Actually, this is not the case.

TypeError: Index does not support mutable operations

 As you can see, ʻindex` seems to be immutable. (Even if you do something similar with a string, you get angry)

 So there is no choice but to overwrite it.

In[43]: HachimanSeries.index = ["Math","Japanese","English"]

In[44]: HachimanSeries Out[44]: Math 0 Japanese 0 English 0 dtype: int64


 Well, let's reset it after reviewing.

In[45]: HachimanSeries[0:0+3] = [8,88,38]

In[46]: HachimanSeries.index = ["math", "japanese", "english"] Out[46]: math 8 japanese 88 english 38 dtype: int64


 The above is the basic operation of Series.
 It's longer than I expected, so I'll talk about Dataframe and Series methods in a subsequent article.






Recommended Posts

Basic operation of Python Pandas Series and Dataframe (1)
[Python] Operation memo of pandas DataFrame
[Python] What is pandas Series and DataFrame?
Basic operation of pandas
Basic operation of Pandas
Python basic operation 3rd: Object-oriented and class
Basic grammar of Python3 series (list, tuple)
[Python] Operation of enumerate
Correspondence summary of array operation of ruby and python
Basic operation list of Python3 list, tuple, dictionary, set
Basic knowledge of Python
Python application: Pandas # 3: Dataframe
Python Basic --Pandas, Numpy-
[Python learning part 3] Convert pandas DataFrame, Series, and standard List to each other
Summary of Python sort (list, dictionary type, Series, DataFrame)
Summary of pre-processing practices for Python beginners (Pandas dataframe)
[Python] Summary of table creation method using DataFrame (pandas)
Installing Python 3 on Mac and checking basic operation Part 1
Python application: Pandas Part 2: Series
Python 2 series and 3 series (Anaconda edition)
Python installation and basic grammar
Basic usage of Pandas Summary
Basic usage of Python f-string
Source installation and installation of Python
Python (Python 3.7.7) installation and basic grammar
The Power of Pandas: Python
[Scientific / technical calculation by Python] Basic operation of arrays, numpy
Summary of Hash (Dictionary) operation support for Ruby and Python
Graph time series data in Python using pandas and matplotlib
[Python] Random data extraction / combination from DataFrame using random and pandas
Environment construction of python and opencv
Python hand play (Pandas / DataFrame beginning)
The story of Python and the story of NaN
About installing Pwntools and Python2 series
Java and Python basic grammar comparison
Installation of SciPy and matplotlib (Python)
[Python] How to use Pandas Series
Division of timedelta in Python 2.7 series
[python] week1-3: Number type and operation
Difference between python2 series and python3 series dict.keys ()
This and that of python properties
Python Math Series ⓪ Table of Contents
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Basic grammar of Python3 system (dictionary)
Basic Python operation 2nd: Function (argument)
Coexistence of Python2 and 3 with CircleCI (1.0)
Summary of Python indexes and slices
Basic study of OpenCV with Python
Reputation of Python books and reference books
[Python] How to add rows and columns to a table (pandas DataFrame)
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
[Python] Basic pattern and usage of if statement (comparison operator and Boolean operator)
[Python] Summary of how to use pandas
[Python] Visualize the heat of Tokyo and XX prefectures (DataFrame usage memo)
Extraction of tweet.js (json.loads and eval) (Python)
Comparing the basic grammar of Python and Go in an easy-to-understand manner
Elasticsearch installation and basic operation for ubuntu
Connect a lot of Python or and and
Python application: Pandas Part 4: DataFrame concatenation / combination