[PYTHON] I made a mistake in fetching the hierarchy with MultiIndex of pandas

MultiIndex of pandas is convenient, but I was addicted to simply treating it as a multidimensional version of Index, so make a note of it.

I was addicted to

As an example, assume that the following table exists as hoge.csv.

val
1 a b
2 c d
3 a d
4 b c
5 a b

If you read the columns other than val in hoge.csv as index, it will be read as DataFrame of MultiIndex.

>>> import pandas as pd
>>> df = pd.read_csv("hoge.csv", index_col=[0, 1])
>>> df
    val
1 a   b
2 c   d
3 a   d
4 b   c
5 a   b

Try filtering this appropriate DataFrame with val

>>> tmp_df = df.query("val=='b'")
>>> tmp_df.index
MultiIndex([(1, 'a'),
            (5, 'a')],
           )

Two elements were extracted from the DataFrame of all five elements.

Furthermore, if you get the 0th layer of the levels property for the filtered result, you might get1,5 ...

>>> tmp_df.index.levels[0]
Int64Index([1, 2, 3, 4, 5], dtype='int64')

Regardless of the filter ** The elements of the 0th layer of the original DataFrame are fetched ** This is a problem because there are times when you want to retrieve the values in each layer after setting conditions for the values in the table and filtering them.

Solution

Levels is just a list that stores the elements included in each level, and it seems that it is realized by combining by defining the relationship between each level.

Therefore, cancel MultiIndex to make it a single Index that leaves the hierarchy that you want to finally retrieve, and then apply a filter.

>>> df.reset_index(level=1)
  level_1 val
1       a   b
2       c   d
3       a   d
4       b   c
5       a   b
>>> tmp_df = df.reset_index(level=1).query("val=='b'")
>>> tmp_df.index
Int64Index([1, 5], dtype='int64')

If you do this, the Index will be the same as the element of the filter, so if you want to retrieve a certain hierarchy after filtering, you have to correspond with reset_index as described above.

In reset_index, if the column name of MultiIndex is dropped, specify the name, and if not, specify the number of the hierarchy to be released in the argument oflevel =.

Recommended Posts

I made a mistake in fetching the hierarchy with MultiIndex of pandas
I made a GAN with Keras, so I made a video of the learning process.
I made a program to check the size of a file in Python
I made an appdo command to execute a command in the context of the app
I made a twitter app that decodes the characters of Pricone with heroku (failure)
I made a simple typing game with tkinter in Python
I made a dot picture of the image of Irasutoya. (part1)
I made a dot picture of the image of Irasutoya. (part2)
I made a class to get the analysis result by MeCab in ndarray with python
I made a fortune with Python.
I made a daemon with Python
Process the contents of the file in order with a shell script
I made a program that solves the spot the difference in seconds
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I made a lot of files for RDP connection with Python
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I made a command to display a colorful calendar in the terminal
I made a program that automatically calculates the zodiac with tkinter
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
[Kaggle] I made a collection of questions using the Titanic tutorial
I made a payroll program in Python!
I made a character counter with Python
I made a Hex map with Python
I made a life game with Numpy
I made a roguelike game with Python
I made a simple blackjack with Python
I made a configuration file with Python
I made a WEB application with Django
I made a neuron simulator with Python
I made a calendar that automatically updates the distribution schedule of Vtuber
Generate a list packed with the number of days in the current month.
[Django] I made a field to enter the date with 4 digit numbers
I made a kind of simple image processing tool in Go language.
Receive a list of the results of parallel processing in Python with starmap
I tried to display the altitude value of DTM in a graph
I made a function to see the movement of a two-dimensional array (Python)
I made a LINE bot that tells me the type and strength of Pokemon in the Galar region with Heroku + Flask + PostgreSQL (Heroku Postgres)
I made a tool to estimate the execution time of cron (+ PyPI debut)
I made a stamp substitute bot with line
I tried the pivot table function of pandas
[Python] Get the files in a folder with Python
Feel free to write a test with nose (in the case of + gevent)
What I investigated in the process of expressing (schematicizing) containers in a nested frame with Jupyter and making it
A collection of Numpy, Pandas Tips that are often used in the field
I made a weather forecast bot-like with Python.
I made a GUI application with Python + PyQt5
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
Talking about the features that pandas and I were in charge of in the project
Get the caller of a function in Python
I made a Twitter fujoshi blocker with Python ①
To output a value even in the middle of a cell with Jupyter Notebook
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
Make a copy of the list in Python
[Python] I made a Youtube Downloader with Tkinter.
I failed to install django with pip, so a reminder of the solution
Find the number of days in a month
With LINEBot, I made an app that informs me of the "bus time"
I want to set a life cycle in the task definition of ECS
I made a simple Bitcoin wallet with pycoin
I made a LINE Bot with Serverless Framework!
I compared the moving average of IIR filter type with pandas and scipy