Pandas is used when dealing with various data in Python, but until now, when searching for a specific numerical value, it was normally processed one by one with for ordf.iterrows ().
But when the data gets bigger, it's slower.
After investigating if there is any good way, it seems that it is better to use df.ix [] as a mask (filter?).
For example, suppose you have a data frame like this:
print df
A B C D E
TIME
2001-01-08 06:00:00 -0.5925 -1.11 0.055023 -0.074103 0.129126
2001-01-08 06:05:00 -0.4213 NaN NaN NaN NaN
2001-01-08 06:10:00 -0.3613 -2.19 NaN NaN NaN
2001-01-08 06:15:00 -0.3613 NaN NaN NaN 0.224574
2001-01-08 06:20:00 -0.3613 NaN 0.129036 -0.000281 1.094003
2001-01-08 06:25:00 -0.8243 -0.65 0.137242 -0.022502 2.145931
I want to know which column is NaN in row B
print df.ix[df["B"].isnull()]
A B C D E
TIME
2001-01-08 06:05:00 -0.4213 NaN NaN NaN NaN
2001-01-08 06:15:00 -0.3613 NaN NaN NaN 0.224574
2001-01-08 06:20:00 -0.3613 NaN 0.129036 -0.000281 1.094003
I want to know the columns that are not NaN in row E
print df.ix[df["E"].isnull()==False]
A B C D E
TIME
2001-01-08 06:00:00 -0.5925 -1.11 0.055023 -0.074103 0.129126
2001-01-08 06:15:00 -0.3613 NaN NaN NaN 0.224574
2001-01-08 06:20:00 -0.3613 NaN 0.129036 -0.000281 1.094003
2001-01-08 06:25:00 -0.8243 -0.65 0.137242 -0.022502 2.145931
Search hits no matter which line is NaN
print df.ix[df.isnull().values.any()]
A B C D E
TIME
2001-01-08 06:05:00 -0.4213 NaN NaN NaN NaN
2001-01-08 06:10:00 -0.3613 -2.19 NaN NaN NaN
2001-01-08 06:15:00 -0.3613 NaN NaN NaN 0.224574
2001-01-08 06:20:00 -0.3613 NaN 0.129036 -0.000281 1.094003
I want only the first three columns where row C is NaN
print df.ix[df["C"].isnull(), :3]
A B C D E
TIME
2001-01-08 06:05:00 -0.4213 NaN NaN NaN NaN
2001-01-08 06:10:00 -0.3613 -2.19 NaN NaN NaN
I want rows A and B of a column that is NaN in row D
print df.ix[df["D"].isnull(), ["A", "B"]]
A B
TIME
2001-01-08 06:05:00 -0.4213 NaN
2001-01-08 06:10:00 -0.3613 -2.19
2001-01-08 06:15:00 -0.3613 Nan
Manipulate the number in row A of the column that is NaN in row D (try subtracting 1)
df.ix[df["D"].isnull(), "A"] -= 1
print df["A"]
A
TIME
2001-01-08 06:00:00 -0.5925
2001-01-08 06:05:00 -1.4213
2001-01-08 06:10:00 -1.3613
2001-01-08 06:15:00 -1.3613
2001-01-08 06:20:00 -0.3613
2001-01-08 06:25:00 -0.8243
Conditional search in the above state
print df["A"]
A
TIME
2001-01-08 06:00:00 -0.5925
2001-01-08 06:05:00 -1.4213
2001-01-08 06:10:00 -1.3613
2001-01-08 06:15:00 -1.3613
2001-01-08 06:20:00 -0.3613
2001-01-08 06:25:00 -0.8243
print df.ix[df["A"]<= -1]
A B C D E
TIME
2001-01-08 06:05:00 -1.4213 NaN NaN NaN NaN
2001-01-08 06:10:00 -1.3613 -2.19 NaN NaN NaN
2001-01-08 06:15:00 -1.3613 NaN NaN NaN 0.224574
It's small in this example, but it's quite fast even with large data frames.
Postscript
Df.apply (function, axis = 1) seems to be the best when you want to operate the numerical value of each column with a complicated conditional expression.
Recommended Posts