[PYTHON] If you want to get multiple statistics with groupby in pandas v1

Overview

In pandas groupby, I gave the argument of agg () in dict format to apply multiple functions to one column, but since that function was deleted in v1, it can no longer be used. The solution. The data is based on the pandas documentation.

data

animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                                 'height': [9.1, 6.0, 9.5, 34.0],
                                 'weight': [7.9, 7.5, 9.9, 198.0]})
スクリーンショット 2020-02-23 0.59.20.png

The process and error I tried this time

#For each kind, total the total value of height and the number of cases, and each"sum_all", "count_all"Name it
animals.groupby("kind")["height"].agg({"sum_all":"sum", "count_all":"count"})

--Expected output (it was possible before v1) スクリーンショット 2020-02-23 1.05.00.png

--Actual output

SpecificationError: nested renamer is not supported

What was the cause

--There was a description in whats new of the document - https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html --It seems to be nested renaming, but it seems that this has disappeared

Removed support for nested renaming in DataFrame.aggregate(), Series.aggregate(), core.groupby.DataFrameGroupBy.aggregate(), core.groupby.SeriesGroupBy.aggregate(), core.window.rolling.Rolling.aggregate() (GH18529)

――As described below, it seems that aggregation and rename cannot be performed for one column at the same time. - https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.20.0.html#whatsnew-0200-api-breaking-deprecate-group-agg-dict

Solution

--The problem is that you are summarizing and renaming at the same time, so you can separate them.

animals.groupby("kind")["height"].agg(["sum", "count"])
                                 .rename(columns={"sum": "sum_all", 
                                                  "count":"count_all"})

-Use Named Aggrecation (feature from v0.25)

animals.groupby("kind")["height"].agg(sum_all="sum",
                                      count_all="count")

-(Reference) Named Aggrecation without columns will be written as follows ――This way of writing is easy to understand because you can specify which variable to aggregate.

animals.groupby("kind").agg(sum_all=("height", "sum"),
                            count_all=("height", "count"))

reference

Recommended Posts

If you want to get multiple statistics with groupby in pandas v1
What to do if you get lost in file reference with FileNotFoundError
What to do if you get angry in TensorFlow v2 without attribute'app'
What to do if you get "coverage unknown" in Coveralls
Useful operation when you want to solve all problems in multiple programming languages with Codewars
I want to do ○○ with Pandas
If you want to use field names with hyphens when updating firestore data in python
If you want to become a data scientist, start with Kaggle
Don't write Python if you want to speed it up with Python
What to do if you get a UnicodeDecodeError with pip install
[TensorFlow] If you want to run TensorBoard, install it with pip
If you want to include awsebcli with CircleCI, specify the python version
If you want to use NumPy, Pandas, Matplotlib, IPython, SciPy on Windows
What to do if you get a TypeError with numpy min, max
What to do if you can't install with pip in babun environment
If you want to count words in Python, it's convenient to use Counter.
What to do if you get Could not fetch URL 443 with pip
What to do if you get angry with "Value Error: unknown local: UTF-8" in python manage.py syncdb
[Python] What to do if you get a ModuleNotFoundError when importing pandas using Jupyter Notebook in Anaconda
What to do if you get a "No versions found" error in pipenv
Workaround if you get an error when trying to install PySide with pip
What to do if you get angry with swapon failed: Operation not permitted
If you want to display values using choices in a template in a Django model
I want to display multiple images with matplotlib.
If you want to create a Word Cloud.
[OpenCV] When you want to check if it is read properly with imread
Easily log in to AWS with multiple accounts
What to do if you get an error when installing python with pyenv
What to do if you get "Python not configured." Using PyDev in Eclipse
If you want to enter the virtual environment with jupyter, nb_conda_kernels is recommended
If you want to make a discord bot with python, let's use a framework
If you get lost with HTTP redirects 301 and 302
What to do if you get a Call with too many input arguments error at DoAndReturn in a golang test
What to do if you get an OpenSSL error when installing Python 2 with pyenv
What to do if you get `No kernel for language python found` in Hydrogen
What to do if you get "(35,'SSL connect error')" in pycurl (one of them)
What to do if you get an error when importing matplotlib in Python (Mac)
If you get the error "basis matrix is singular to working precision" in GLPK
I want to get an error message in Japanese with django Password Change Form
What to do if you get an Import Error when importing matplotlib with Jupyter
What to do if you run python in IntelliJ and end with an error
How to get multiple model objects randomly in Django
I want to transition with a button in flask
Settings when you want to run python-mecab with travis
If you want to use Cython, also include python-dev
How to access with cache when reading_json in pandas
When you want to filter with Django REST framework
I want to work with a robot in python.
When you want to plt.save in a for statement
Convert numeric variables to categorical with thresholds in pandas
Convert 202003 to 2020-03 with pandas
Make a note of what you want to do in the future with Raspberry Pi
What to do if you get Swagger-codegen in python and Import Error: No module named
If you want to put an argument in the closure function and execute it later
What to do if you get a Cannot retrieve metalink for repository error in yum
What to do if you get an error when running "certbot renew" in CakePHP environment
What to do if you get an Undefined error when trying to use pip with pyenv
Solution when you want to use cv_bridge with python3 (virtualenv)
How to embed multiple embeds in one message with Discord.py
[Python] If you suddenly want to create an inquiry form
Solution if you get 0xxx ascii coding error in superset