First, prepare the data frame below.
Preparation of DataFrame
import pandas as pd
values = [['1', 'John', 'somekey1-1', 'somevalue1-1', 'time1-1', 'somekey2-1', 'somevalue2-1', 'time2-1'],
['2', 'Tom', 'somekey1-2', 'somevalue1-2', 'time1-2', 'somekey2-2', 'somevalue2-2', 'time2-2'],]
df = pd.DataFrame(values, columns=['id', 'name', 'key1', 'value1', 'time1', 'key2', 'value2', 'time2'])
df
The following data is created with the above code.
id | name | key1 | value1 | time1 | key2 | value2 | time2 | |
---|---|---|---|---|---|---|---|---|
0 | 1 | John | somekey1-1 | somevalue1-1 | time1-1 | somekey2-1 | somevalue2-1 | time2-1 |
1 | 2 | Tom | somekey1-2 | somevalue1-2 | time1-2 | somekey2-2 | somevalue2-2 | time2-2 |
I will introduce four codes that convert this information to vertically held data as shown below.
id | name | key | value | time | |
---|---|---|---|---|---|
0 | 1 | John | somekey1-1 | somevalue1-1 | time1-1 |
1 | 2 | Tom | somekey1-2 | somevalue1-2 | time1-2 |
2 | 1 | John | somekey2-1 | somevalue2-1 | time2-1 |
3 | 2 | Tom | somekey2-2 | somevalue2-2 | time2-2 |
It seems common to use the melt method, and there was a lot of this information. There are several ways to create an array of columns, so I'll include that as well.
Create an array of column names
#Pattern ①
columns = df.columns.tolist()
[value for value in columns if value.startswith('key')]
#Pattern ②
df.columns[df.columns.str.startswith('key')].tolist()
#result
# ['key1', 'key2']
Pattern ①, which creates and moves an array of columns once, seems to move lighter, so I'm using that below.
Convert to vertical data using melt
columns = df.columns.tolist()
pd.concat(
[pd.melt(df, id_vars=['id', 'name'], value_vars=[value for value in columns if value.startswith('key')], value_name='key'),
pd.melt(df, value_vars=[value for value in columns if value.startswith('value')], value_name='value'),
pd.melt(df, value_vars=[value for value in columns if value.startswith('time')], value_name='time')
],
axis=1
).drop('variable', axis=1)
wide_to_long is pretty simple because you can create it in one line. I didn't understand at first when I looked at the following site, https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
In the array specified by the second argument `convert columns starting with a specific character to vertical holding`` Because it moves like this, it is possible to complete it in one line.
The remaining part of the column is specified by j
If it is key1
, create it with 1
, and if it is key2
, create it with 2
with the specified column name.
In the code below, a column called'drop'is created, so after that, it is deleted by the drop
method.
wide_to_Convert to vertical data using long
pd.wide_to_long(df, ['key','value','time'], i='id', j='drop').reset_index().drop('drop', axis=1)
What to do if the following error occurs The following error is an error that occurs when there are duplicate id items.
error
ValueError: the id variables need to uniquely identify each row
For example, if you change the first data frame a little and set both ids to 1
, you will get an error.
DataFrame with an error
import pandas as pd
values = [['1', 'John', 'somekey1-1', 'somevalue1-1', 'time1-1', 'somekey2-1', 'somevalue2-1', 'time2-1'],
['1', 'Tom', 'somekey1-2', 'somevalue1-2', 'time1-2', 'somekey2-2', 'somevalue2-2', 'time2-2'],]
df = pd.DataFrame(values, columns=['id', 'name', 'key1', 'value1', 'time1', 'key2', 'value2', 'time2'])
pd.wide_to_long(df,['key','value','time'], i='id', j='drop').reset_index().drop('drop', axis=1)
In that case, you can solve it by creating an index item with reset_index ()
and specifying it in id.
wide_to_Convert to vertical data using long(Error avoidance method)
pd.wide_to_long(df.reset_index(), ['key','value','time'], i='index', j='drop').reset_index().drop('drop', axis=1).drop('index', axis=1)
lreshape
seems to be a minor method that can be fixed to reshape
even if you search on google.
Personally, I like it because it's simple, but the site below says that it will disappear in the future, so it seems that it will not be usable soon. Sorry.
https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html
Convert to vertical data using lreshape
d = {'key': df.columns[df.columns.str.startswith('key')].tolist(),
'value': df.columns[df.columns.str.startswith('value')].tolist(),
'time': df.columns[df.columns.str.startswith('time')].tolist(),}
pd.lreshape(df, d)
Also, although it should be written that it is used in practice, for some reason the following It seems better not to use it too much as it may cause an error.
error
/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/melt.py in <dictcomp>(.0)
188 mask &= notna(mdata[c])
189 if not mask.all():
--> 190 mdata = {k: v[mask] for k, v in mdata.items()}
191
192 return data._constructor(mdata, columns=id_cols + pivot_cols)
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1210 but corresponding boolean dimension is 24200
Probably not quite good. When I didn't know the above method at first, I did it.
When using the concat method if the column names are the same It is a method to use it because it joins vertically.
Implementation with concat
pd.concat([
df[['id', 'name', 'key1', 'value1', 'time1']].rename(columns={'key1': 'key', 'value1': 'value', 'time1': 'time'}),
df[['id', 'name', 'key2', 'value2', 'time2']].rename(columns={'key2': 'key', 'value2': 'value', 'time2': 'time'}),
])
Recommended Posts