[PYTHON] [Pandas] Find quartiles and detect outliers

Note that I sometimes wanted to use quartiles to detect outliers

Get the quartile of series

Q1 = series.quantile(.25)
Q3 = series.quantile(.75)

Or

Q1 = series.describe()['25%']
Q3 = series.describe()['75%']

Detection of outliers using quartiles

#Extract only the data whose value is out of order in column A
IQR = Q3 - Q1
threshold = Q3 + 1.5 * IQR

df_outlier = df[df['A'].apply(lambda x:x > threshold)]

On the contrary, if you want the data that fits, you can take a logical negation like df [~ df ...] It's good if you change the direction of the inequality sign.

Summary

There seems to be a guy who can get outliers in one shot without doing this ...

Statistics beginners are tinkering with data using pandas. I would be grateful if you could tell me if there is any good way.

reference

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html

Refer to this document for quartiles http://www.contents-station.net/gacco/Data_Analysis_Innovation/Week03/3-4.pdf

Recommended Posts

[Pandas] Find quartiles and detect outliers
pandas index and reindex
Fill outliers with NaN based on quartiles in Pandas
pandas resample and rolling
Pandas averaging and listing
Correspondence between pandas and SQL
Key additions to pandas 1.1.0 and 1.0.0