[PYTHON] Key additions to pandas 1.1.0 and 1.0.0

About this article

Pandas was updated from version 1.0 to 1.1.0 on July 28, 2020. This article summarizes the main additions to 1.1.0 and the main additions to the Nth brew, but the January 2020 update from 0.25.3 to 1.0.0.

Official information

https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.0.0.html

https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.1.0.html

Please refer to.

Verification environment

Version 1.0 is verified with 1.0.5 and 1.1 with 1.0.0. 0.25 is verified with 0.25.1.

1.0

pd.NA Up to 0.25, there are various values such as np.nan for float, np.nan or None for object (character string), and pd.NaT for time data. It was used.

In 1.0, pd.NA was introduced to represent missing values.

For example

pd.Series([1, 2, None], dtype="Int64")

The third element of is np.nan in version 0.25, but it becomes pd.NA in 1.0.

Until 0.25, the numeric column with missing (np.nan) was forced to float64, but in 1.0 it is possible to call it an Int8 type column with pd.NA.

string (StringD type) type

A type string (StringDtype) type that represents a Series (DataFrame column) of string data has been added. When dealing with a series (or column) of a character string, it is recommended to use the string type.

Up to 0.25, it was the object type that represented the Series (or column) containing string data, so

pd.Series(['abc', True, 'def'], dtype="object")

I could only express that (mixture of letters and booleans) was allowed,

From 1.0

pd.Series(['abc', 'def'], dtype="string")

If so, the Series (or column) is only allowed as a character string.

pd.Series(['abc', True, 'def'], dtype="string")

Is an error.

pd.Series(['abc', 'def', None], dtype="string")

The third element of is pd.NA.

However

pd.Series(['abc', True, 'def'])

(No dtype specified) is an object type as before, and this expression is also allowed.

boolean (booleanDtype) type

A boolean (booleanDtype) type that represents boolean data has been added. It is recommended to use boolean type when dealing with boolean (True or False) Series (or column).

pd.Series([True, False, 0], dtype="booleal")

Is an error. (If dtype is not specified, it is allowed without error. If `dtype =" bool ", 0 is converted to False)

Regarding the handling of missing value,

pd.Series([True, False, np.nan])
pd.Series([True, False, None])

The third element of is np.nan and None, respectively.

pd.Series([True, False, np.nan], dtype="boolean")
pd.Series([True, False, None], dtype="boolean")

Then, the third element is pd.NA.

pd.Series([True, False, np.nan], dtype="bool")
pd.Series([True, False, None], dtype="bool")

In the case of, the third element is True and False, respectively.

convert_dtypes function

df = pd.DataFrame({'x': ['abc', None, 'def'],
                   'y': [1, 2, np.nan],
                   'z': [True, False, True]})

Is column x: object, column y: float64, column z: bool. Even though the string type and boolean type have been created ...

Therefore,

df.convert_dtypes()

Then, x column: string, y column: Int64, z column: boolean are converted. None and np.nan are now pd.NA.

The above NA and type functions are experimental functions and are subject to change.

ignore_index argument

The ignore_index argument has been added to DataFrame.sort_values () and DataFrame.drop_duplicates (). When ignore_index = True, the index after sorting is reassigned in order from 0. Good news for pandas index annoyances.

1.1

dtype="string", astype("string")

pd.Series([1, "abc", np.nan], dtype="string")
pd.Series([1, 2, np.nan], dtype="Int64").astype("string")

All elements are strings. Up to 1.0 error if all elements are not strings or nan.

groupby

df = pd.DataFrame([[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]], columns=["a", "b", "c"])

df.groupby(by=["b"], dropna=False).sum()

The result of

     a  c
b        
1.0  2  3
2.0  2  5
NaN  1  4

And even if the column value specified by by is added to the NA row, it will be aggregated. Behavior similar to R's dplyr group_by.

If dropna = True or not specified, the rows whose column value specified by by is NA are not aggregated.

Recommended Posts

Key additions to pandas 1.1.0 and 1.0.0
How to use pandas Timestamp and date_range
Add totals to rows and columns in pandas
How to use Pandas 2
Convert 202003 to 2020-03 with pandas
jupyter and pandas installation
pandas index and reindex
pandas resample and rolling
Pandas averaging and listing
How to extract null values and non-null values with pandas
Stop SSH password authentication and switch to key authentication completely
Correspondence between pandas and SQL
[Python] Convert list to Pandas [Pandas]
To go back and forth between standard python, numpy, pandas ①
How to use Pandas Rolling
React and Flask to GCP
How to create dataframes and mess with elements in pandas
Export pandas dataframe to excel
[Python] How to add rows and columns to a table (pandas DataFrame)
How to install pandas on EC2 (How to deal with MemoryError and PermissionError)
I want to use both key and value of Python iterator
How to format a table using Pandas apply, pivot and swaplevel
Python 3.6 on Windows ... and to Xamarin.
[Introduction to Python3 Day 1] Programming and Python
How to install and use Tesseract-OCR
How to write soberly in pandas
[Python] How to use Pandas Series
Scraping, preprocessing and writing to postgreSQL
Etosetra related to read_csv of Pandas
Precautions when using codecs and pandas
[Introduction to Python] Let's use pandas
How to install and configure blackbird
How to use .bash_profile and .bashrc
How to install CUDA and nvidia-driver
[Pandas] Find quartiles and detect outliers
How to install and use Graphviz
Key operations you want to know
I want to do ○○ with Pandas
Python logging and dump to json
[Introduction to Python] Let's use pandas
Selenium and python to open google
Ignore # line and read in pandas
[Introduction to Python] Let's use pandas
How to solve slide puzzles and 15 puzzles
How to get a specific column name and index name in pandas DataFrame
Join data with main key (required) and subkey (optional) in Python pandas