[PYTHON] How to convert horizontally held data to vertically held data with pandas

First, prepare the data frame below.

Preparation of DataFrame

import pandas as pd
values = [['1', 'John', 'somekey1-1', 'somevalue1-1', 'time1-1', 'somekey2-1', 'somevalue2-1', 'time2-1'],
          ['2', 'Tom', 'somekey1-2', 'somevalue1-2', 'time1-2', 'somekey2-2', 'somevalue2-2', 'time2-2'],]
df = pd.DataFrame(values, columns=['id', 'name', 'key1', 'value1', 'time1', 'key2', 'value2', 'time2'])

The following data is created with the above code.

id name key1 value1 time1 key2 value2 time2
0 1 John somekey1-1 somevalue1-1 time1-1 somekey2-1 somevalue2-1 time2-1
1 2 Tom somekey1-2 somevalue1-2 time1-2 somekey2-2 somevalue2-2 time2-2

I will introduce four codes that convert this information to vertically held data as shown below.

id name key value time
0 1 John somekey1-1 somevalue1-1 time1-1
1 2 Tom somekey1-2 somevalue1-2 time1-2
2 1 John somekey2-1 somevalue2-1 time2-1
3 2 Tom somekey2-2 somevalue2-2 time2-2

How to implement with melt method

It seems common to use the melt method, and there was a lot of this information. There are several ways to create an array of columns, so I'll include that as well.

Create an array of column names

#Pattern ①
columns = df.columns.tolist()
[value for value in columns if value.startswith('key')]

#Pattern ②

# ['key1', 'key2']

Pattern ①, which creates and moves an array of columns once, seems to move lighter, so I'm using that below.

Convert to vertical data using melt

columns = df.columns.tolist()

    [pd.melt(df, id_vars=['id', 'name'], value_vars=[value for value in columns if value.startswith('key')], value_name='key'),
     pd.melt(df, value_vars=[value for value in columns if value.startswith('value')], value_name='value'),
     pd.melt(df, value_vars=[value for value in columns if value.startswith('time')], value_name='time')
     ).drop('variable', axis=1)

How to implement with wide_to_long method

wide_to_long is pretty simple because you can create it in one line. I didn't understand at first when I looked at the following site, https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html

In the array specified by the second argument `convert columns starting with a specific character to vertical holding`` Because it moves like this, it is possible to complete it in one line.

The remaining part of the column is specified by j If it is key1, create it with 1, and if it is key2, create it with 2 with the specified column name. In the code below, a column called'drop'is created, so after that, it is deleted by the drop method.

wide_to_Convert to vertical data using long

pd.wide_to_long(df, ['key','value','time'], i='id', j='drop').reset_index().drop('drop', axis=1)

If you get an error with wide_to_long

What to do if the following error occurs The following error is an error that occurs when there are duplicate id items.


ValueError: the id variables need to uniquely identify each row

For example, if you change the first data frame a little and set both ids to 1, you will get an error.

DataFrame with an error

import pandas as pd
values = [['1', 'John', 'somekey1-1', 'somevalue1-1', 'time1-1', 'somekey2-1', 'somevalue2-1', 'time2-1'],
          ['1', 'Tom', 'somekey1-2', 'somevalue1-2', 'time1-2', 'somekey2-2', 'somevalue2-2', 'time2-2'],]
df = pd.DataFrame(values, columns=['id', 'name', 'key1', 'value1', 'time1', 'key2', 'value2', 'time2'])

pd.wide_to_long(df,['key','value','time'], i='id', j='drop').reset_index().drop('drop', axis=1)

In that case, you can solve it by creating an index item with reset_index () and specifying it in id.

wide_to_Convert to vertical data using long(Error avoidance method)

pd.wide_to_long(df.reset_index(), ['key','value','time'], i='index', j='drop').reset_index().drop('drop', axis=1).drop('index', axis=1)

How to do with lreshape method

lreshape seems to be a minor method that can be fixed to reshape even if you search on google. Personally, I like it because it's simple, but the site below says that it will disappear in the future, so it seems that it will not be usable soon. Sorry. https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

Convert to vertical data using lreshape

d = {'key': df.columns[df.columns.str.startswith('key')].tolist(),
     'value': df.columns[df.columns.str.startswith('value')].tolist(),
     'time': df.columns[df.columns.str.startswith('time')].tolist(),}

pd.lreshape(df, d)

Also, although it should be written that it is used in practice, for some reason the following It seems better not to use it too much as it may cause an error.


/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/melt.py in <dictcomp>(.0)
    188             mask &= notna(mdata[c])
    189         if not mask.all():
--> 190             mdata = {k: v[mask] for k, v in mdata.items()}
    192     return data._constructor(mdata, columns=id_cols + pivot_cols)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 1210 but corresponding boolean dimension is 24200

How to do your best with concat

Probably not quite good. When I didn't know the above method at first, I did it.

When using the concat method if the column names are the same It is a method to use it because it joins vertically.

Implementation with concat

    df[['id', 'name', 'key1', 'value1', 'time1']].rename(columns={'key1': 'key', 'value1': 'value', 'time1': 'time'}),
    df[['id', 'name', 'key2', 'value2', 'time2']].rename(columns={'key2': 'key', 'value2': 'value', 'time2': 'time'}),

Recommended Posts

How to convert horizontally held data to vertically held data with pandas
Convert 202003 to 2020-03 with pandas
I want to convert vertically held data (long type) to horizontally held data (wide type)
How to extract non-missing value nan data with pandas
How to convert JSON file to CSV file with Python Pandas
How to extract non-missing value nan data with pandas
How to deal with imbalanced data
How to deal with imbalanced data
How to Data Augmentation with PyTorch
Convert Excel data to JSON with python
Use pandas to convert grid data to row-holding (?) Data
Convert FX 1-minute data to 5-minute data with Python
Try converting to tidy data with pandas
How to read problem data with paiza
How to create sample CSV data with hypothesis
Convert data with shape (number of data, 1) to (number of data,) with numpy.
[Python] How to read excel file with pandas
How to scrape horse racing data with BeautifulSoup
How to convert (32,32,3) to 4D tensor (1,32,32,1) with ndarray type
How to use Pandas 2
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Convert Mobile Suica usage history PDF to pandas Data Frame format with tabula-py
How to replace with Pandas DataFrame, which is useful for data analysis (easy)
How to convert / restore a string with [] in python
How to use xgboost: Multi-class classification with iris data
How to scrape image data from flickr with python
How to convert a class object to a dictionary with SQLAlchemy
How to access with cache when reading_json in pandas
How to get more than 1000 data with SQLAlchemy + MySQLdb
How to extract null values and non-null values with pandas
How to output CSV of multi-line header with pandas
[Python] How to deal with pandas read_html read error
[Python] A memo to write CSV vertically with Pandas
Convert numeric variables to categorical with thresholds in pandas
[Python] Convert list to Pandas [Pandas]
How to update with SQLAlchemy?
How to use Pandas Rolling
How to cast with Theano
How to Alter with SQLAlchemy?
How to separate strings with','
How to RDP with Fedora31
How to handle data frames
Data processing tips with Pandas
How to Delete with SQLAlchemy?
How to convert an array to a dictionary with Python [Application]
Ingenuity to handle data with Pandas in a memory-saving manner
How to get an overview of your data in Pandas
How to read an Excel file (.xlsx) with Pandas [Python]
How to create dataframes and mess with elements in pandas
Data science companion in python, how to specify elements in pandas
[Introduction to Python] How to get data with the listdir function
How to cancel RT with tweepy
How to extract features of time series data with PySpark Basics
[Python] How to FFT mp3 data
Python: How to use async with
How to read e-Stat subregion data
How to write soberly in pandas
[Python] How to use Pandas Series
[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)