Introduction
In this article, I will introduce various conditional extractions using query, which is one of the pandas functions.
In my experience, I will write the actual code based on 10 examples that are frequently encountered during data preprocessing.
Execution environment
- mac OS Mojave
- Python 3.7.3
!python -V
Python 3.7.3
Module used
The version of pandas uses 1.0.3.
- An error may occur if the version is 0.24.0 or earlier.
import pandas as pd #The leading role this time
import numpy as np #Used to generate inf and nan values
#Check version of pandas
print(pd.__version__)
1.0.3
Data preparation
#Preparation of data frame to be used this time
df = pd.DataFrame({
'a': range(1, 6),
'b': range(10, 0, -2),
'c c' : [1, 3, 5, 7, 9],
'd' : [2, np.nan, np.inf, None, 10],
'e' : ['a', 'b', 'c', 'd', 'e'],
'f' : [False, True, True, False, True]})
#Check data frame
df
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
1 |
2 |
8 |
3 |
NaN |
b |
True |
2 |
3 |
6 |
5 |
inf |
c |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
10 selections of data extraction
- [Condition extraction in 1 column](Condition extraction in # 1 column)
- [Condition extraction with multiple columns](# Condition extraction with multiple columns)
- [Conditional extraction of columns with spaces in column names](#Conditional extraction of columns with spaces in column names)
- [Condition extraction by character string specification](# Condition extraction by character string specification)
- [Condition extraction by multiple conditions](# Condition extraction by multiple conditions)
- [Condition extraction by inf specification](Condition extraction by #inf specification)
- [Condition extraction by specifying missing values](# Condition extraction by specifying missing values)
- [Condition extraction by specifying boolean value](#Conditional extraction by specifying boolean value)
- [Condition extraction using list](# Condition extraction using list)
- [Condition extraction using variables](# Condition extraction using variables)
Condition extraction in one column
Basic conditional extraction using only one column of the data frame. Normal comparison operators can be used.
df.query('a == 3') #Extract rows with column a value of 3
|
a |
b |
c c |
d |
e |
f |
2 |
3 |
6 |
5 |
inf |
c |
True |
df.query('a != 3') #Extract rows where the value in column a is not 3
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
1 |
2 |
8 |
3 |
NaN |
b |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
df.query('a > 3') #Extract rows with column a value greater than 3
|
a |
b |
c c |
d |
e |
f |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Condition extraction in multiple columns
Conditional extraction using values between multiple columns
df.query('a > b') #Extract rows where the value in column a is greater than the value in column b
|
a |
b |
c c |
d |
e |
f |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Conditional extraction of columns containing spaces in column names
If the column name contains spaces, the column name must be enclosed in backticks (\ `).
Since this function was implemented in pandas 0.25.0, an error will occur in version 0.24.0 or earlier.
https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html#other-enhancements
df.query(' `c c` > b') #c Extract rows where the value in column c is greater than the value in column b
|
a |
b |
c c |
d |
e |
f |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Condition extraction by specifying a character string
If you specify the condition as a string, you must enclose the value in double quotes (").
df.query('e == "a" ') #The value in column e is a string"a"Extract the line that is
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
Condition extraction by multiple conditions
Condition extraction using multiple conditions and logical operators such as and and or.
df.query('a > 2 and b < 3') #Extract rows where the value in column a is greater than 2 and the value in column b is less than 3
|
a |
b |
c c |
d |
e |
f |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
df.query('a > 2 or b < 3') #Extract rows where the value in column a is greater than 2 or the value in column b is less than 3
|
a |
b |
c c |
d |
e |
f |
2 |
3 |
6 |
5 |
inf |
c |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
df.query('a > 2 or e == "d" ') #The value in column a is greater than 2 or the value in column e is a string"d"Extract the line that is
|
a |
b |
c c |
d |
e |
f |
2 |
3 |
6 |
5 |
inf |
c |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
df.query(' 2<= a <= 4') #Extract rows where the value in column a is 2 or more and 4 or less
|
a |
b |
c c |
d |
e |
f |
1 |
2 |
8 |
3 |
NaN |
b |
True |
2 |
3 |
6 |
5 |
inf |
c |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
Condition extraction by specifying inf
If the value has inf, extract as follows.
df.query(' d == inf') #Extract rows where the value in column d is inf
|
a |
b |
c c |
d |
e |
f |
2 |
3 |
6 |
5 |
inf |
c |
True |
df.query(' d != inf') #Extract rows where the value in column d is not inf
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
1 |
2 |
8 |
3 |
NaN |
b |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Condition extraction by specifying missing values
If the value has a missing value, extract as follows.
df.query(' d != d') #Extract rows where the value in column d is missing
|
a |
b |
c c |
d |
e |
f |
1 |
2 |
8 |
3 |
NaN |
b |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |
df.query(' d == d') #Extract rows where the value in column d is not missing
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
2 |
3 |
6 |
5 |
inf |
c |
True |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Condition extraction by specifying boolean value
If the value has a boolean value such as True or False, extract as follows.
df.query('f == True') #Extract rows where the value in column f is True
|
a |
b |
c c |
d |
e |
f |
1 |
2 |
8 |
3 |
NaN |
b |
True |
2 |
3 |
6 |
5 |
inf |
c |
True |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
df.query('f == False') #Extract rows where the value in column f is False
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
3 |
4 |
4 |
7 |
NaN |
d |
False |
Conditional extraction using a list
Conditional extraction is possible depending on whether the value is in the list.
df.query(' a in (1, 2, 3)') #The value in column a is(1, 2, 3)Extract a row that is one of
|
a |
b |
c c |
d |
e |
f |
0 |
1 |
10 |
1 |
2.0 |
a |
False |
1 |
2 |
8 |
3 |
NaN |
b |
True |
2 |
3 |
6 |
5 |
inf |
c |
True |
df.query(' a not in (1, 2, 3)') #The value in column a is(1, 2, 3)Extract a row that is none of
|
a |
b |
c c |
d |
e |
f |
3 |
4 |
4 |
7 |
NaN |
d |
False |
4 |
5 |
2 |
9 |
10.0 |
e |
True |
Conditional extraction using variables
It is possible to use variables prepared in advance as the value of the condition.
x = 2 #Store 2 in variable x
df.query(' a == @x') #Extract the row where the value of column a is x
|
a |
b |
c c |
d |
e |
f |
1 |
2 |
8 |
3 |
NaN |
b |
True |
y = [2, 3, 4] #Store list in variable y
df.query('a in @y') #Extract rows where the value in column a is one of y
|
a |
b |
c c |
d |
e |
f |
1 |
2 |
8 |
3 |
NaN |
b |
True |
2 |
3 |
6 |
5 |
inf |
c |
True |
3 |
4 |
4 |
7 |
NaN |
d |
False |