[PYTHON] Basic operation of pandas

About pandas

Since pandas is a data operation based on numpy, it is convenient because the operation of numpy can be used as it is. However, it is difficult to understand how to extract rows and columns until you get used to it. I'm still unfamiliar with it, so I'll write it down.

DataFrame and Series

There are two types of data formats in pandas, DataFrame and Series. The former is two-dimensional data and the latter is one-dimensional data. Basically, Series is rarely used, so we will focus on DataFrame. When one column is specified and fetched from DataFrame, it becomes Series type.

# DataFrame
   foo  bar
a    0    1
b    2    3
c    4    5

# Series
a    0
b    2
c    4

DataFrame operations

index and column

In DataFrame, element numbers such as numpy such as the nth row and mth column and user-defined element specifications by index and column can be specified as element position information. Unless otherwise specified, a number is assigned, but it is not used in practice because it is the same as numpy in such usage. Personally, I also wonder if index can be a number.

Specifying index and columns

To specify index and columns, do as follows.

df.columns = ['foo', 'bar']
df.index = ['a', 'b', 'c']

Also, to check the index and columns name of DataFrame, do as follows.

df.columns
df.index
df.info() # columns, index, memory usage

Simplest column extraction

In DataFrame, the specification of how to take __getitem__ is the specification of columns. You can also retrieve by column number, but in that case, you need to specify even a single list type. However, the line number (index) of index cannot be specified by this method. In the case of Series, index is specified by __getitem__. It's natural because there is only one column.

df['foo'] or df[[0]]  # designate single column
df[['foo', 'bar']] or df[[0, 1]] # designate multi columns

More detailed extraction (ix, iloc, loc)

As mentioned above, there are matrix element numbers and user-defined names as element position information on the DataFrame. There are three types, ix, iloc, and loc, to clarify which one is used for extraction. iloc can be specified only by number, loc can be specified only by name, and ix can be specified by both. Taking the above example, if you want to take [0,0], you can write as follows.

df.ix[[0], [0]]
df.ix[[0], ['foo']]
df.ix[['a'], ['foo']]
df.ix[['a'], [0]]
df.iloc[[0], [0]]
df.loc[['a'], ['foo']]

By the way, if you want to specify multiple indexes, you can do as follows.

df.ix[:, [0]]   #all
df.ix[1:5, [0]] #Range specification
df.ix[:]        #Specify only index

Extraction of rows / columns from specified conditions

Extraction of rows

How to extract rows that meet certain conditions from specified columns. All columns in that column are output.

print foo.loc[foo['bar'] == condition]

Column extraction

Indirectly, the elements that do not meet the conditions are made NaN, and then the columns containing NaN are deleted.

foo = foo[foo == 1] #All elements that do not meet the conditions are NaN.
foo = foo.dropna(axis=1)

Iterator

When iterating for each column of pd.DataFrame.

for index, rows in df.iterrows():
    print index, rows # rows: pd.It is a DataFrame.

Create / edit pd.DataFrame.

#When creating only a vessel
foo = pd.DataFrame(columns=['bar', 'baz'])

foo = pd.DataFrame({'bar': [0, 1, 2],
                    'baz': [3, 4, 5]}
                    index=['a', 'b', 'c'])
# foo
    bar  baz
a    0    3
b    1    4
c    2    5

Add column

Adding a new column is easier than adding a row.

foo['qux'] = [6, 7, 8]
# foo
    bar  baz  qux
a    0    3    6
b    1    4    7
c    2    5    8

Add a row

foo = foo.append(pd.DataFrame({'bar': [6, 7], 'baz': [8, 9]}, index=['d', 'e']))
# foo
#If you want to modify the index, you need to specify it yourself.
    bar  baz
a    0    3
b    1    4
c    2    5
d    6    7
e    8    9

Delete rows and columns.

foo.drop('e')
foo.drop('bar', axis=1) #Delete the column.
del foo['bar'] #Delete the column.(I am using python del.)

Reference URL http://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas

referenced URL: http://sinhrks.hatenablog.com/entry/2014/11/12/233216

Recommended Posts

Basic operation of pandas
Basic operation of Pandas
Basic usage of Pandas Summary
Basic operation of Python Pandas Series and Dataframe (1)
Pandas operation memorandum
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
[Python] Operation of enumerate
Automatic operation of Chrome with Python + Selenium + pandas
Basic usage of flask-classy
Basic usage of Jinja2
Basic operation list of Python3 list, tuple, dictionary, set
About MultiIndex of pandas
Basic usage of SQLAlchemy
Basic knowledge of Python
Basic processing of librosa
Python Basic --Pandas, Numpy-
Make a note of the list of basic Pandas usage
Python application: Pandas Part 1: Basic
Super basic usage of pytest
Basic usage of PySimple GUI
Formatted display of pandas DataFrame
Operation of filter (None, list)
Basic flow of anomaly detection
XPath Basics (1) -Basic Concept of XPath
One-liner basic graph of HoloViews
Behavior of pandas rolling () method
Basic usage of Python f-string
Index of certain pandas usage
The Power of Pandas: Python
I wrote the basic operation of Seaborn in Jupyter Lab
[Scientific / technical calculation by Python] Basic operation of arrays, numpy
I wrote the basic operation of Numpy in Jupyter Lab.
I wrote the basic operation of matplotlib with Jupyter Lab
Basic knowledge of Linux and basic commands
Work memorandum (pymongo) Part 1. Basic operation
Summary of basic knowledge of PyPy Part 1
Summary of basic implementation by PyTorch
Features of pd.NA in pandas 1.0.0 (rc0)
Etosetra related to read_csv of Pandas
Pandas
About the basic type of Go
[Memo] Small story of pandas, numpy
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Notify LINE of train operation information
Basic grammar of Python3 system (dictionary)
Basic Python operation 2nd: Function (argument)
Operation memo of Conda virtual environment
Basic study of OpenCV with Python
Bar graph display in pandas (basic edition)
Python basic operation 1st: List comprehension notation
[Python] Summary of how to use pandas
[Linux] Review of frequently used basic commands 2
Summary of methods often used in pandas
Import of japandas with pandas 1.0 and above
[Design study 1] Design study of PC operation automation system 1
Operation of virtual currency automatic trading script
A little scrutiny of pandas 1.0 and dask
Elasticsearch installation and basic operation for ubuntu
Basic writing of various programming languages (self-memo)
Basic usage of Btrfs on Arch Linux