This time, I will summarize the events that I did not immediately understand when using Pandas. How do I operate Pandas? I often get lost, so I hope it will be helpful as much as possible.
--First, create a data set appropriately. This article will use this simple data as an example.
python
index = ['Product A', 'Product B', 'Product C']
columns = ['spring', 'summer', 'autumn', 'winter']
data = np.array([
[0,10,20,30],
[0,0,100,20],
[50,100,20,40],
])
df = pd.DataFrame(data, index=index, columns=columns)
df.head()
spring | summer | autumn | winter | |
---|---|---|---|---|
Product A | 0 | 10 | 20 | 30 |
Product B | 0 | 0 | 100 | 20 |
Product C | 50 | 100 | 20 | 40 |
--In other words, for product A, winter is the largest, and for product C, summer is the largest, so insert that season into the new column "Selling season".
python
for index, row in df.iterrows():
# df.ix[index, 'Selling season'] = row.argmax()
df.ix[index, 'Selling season'] = row.idxmax() #You can do either
# df.ix[index, 'Unsellable season'] = row.argmin()
df.ix[index, 'Unsellable season'] = row.idxmin() #You can do either
df.ix[index, 'MAX'] = row.max()
df.ix[index, 'MIN'] = row.min()
df.head()
spring | summer | autumn | winter | Selling season | Unsellable season | MAX | MIN | |
---|---|---|---|---|---|---|---|---|
Product A | 0 | 10 | 20 | 30 | winter | spring | 30 | 0 |
Product B | 0 | 0 | 100 | 20 | autumn | spring | 100 | 0 |
Product C | 50 | 100 | 20 | 40 | summer | autumn | 100 | 20 |
――In other words, I want to put out the season when each product starts to sell for the first time. A is summer, B is autumn
python
for index, row in df.iterrows():
#index[0]Take out the first one that meets the conditions with
df.ix[index, 'Season to start selling'] = row[row > 0].index[0]
df.head()
spring | summer | autumn | winter | Season to start selling | |
---|---|---|---|---|---|
Product A | 0 | 10 | 20 | 30 | summer |
Product B | 0 | 0 | 100 | 20 | autumn |
Product C | 50 | 100 | 20 | 40 | spring |
――In other words, I want to extract the line of product C that sells all year round.
python
#not_zero_df = df.query('spring>0 and summer>0 and autumn>0 and winter> 0')
#not_zero_df = df.query('spring!=0 and summer!=0 and autumn!=0 and winter!= 0')
#↑ It's okay, but it can be outside as a variable. Below "@Just add
hoge1,hoge2,hoge3,hoge4 = 0,0,0,0
not_zero_df = df.query('spring> @hoge1 and summer> @hoge2 and autumn> @hoge3 and winter> @hoge4')
not_zero_df.head()
spring | summer | autumn | winter | |
---|---|---|---|---|
Product C | 50 | 100 | 20 | 40 |
--By the way, if you want to extract something other than 0 by focusing only on summer, you can also write like this (extract lines where summer is not 0).
python
hoge = 0
not_zero_df = df.query('summer!= @hoge')
not_zero_df.head()
spring | summer | autumn | winter | |
---|---|---|---|---|
Product A | 0 | 10 | 20 | 30 |
Product C | 50 | 100 | 20 | 30 |
** We will continue to add it as needed **
Recommended Posts