[PYTHON] [Pandas] Find quartiles and detect outliers

Note that I sometimes wanted to use quartiles to detect outliers

Get the quartile of series

Q1 = series.quantile(.25)
Q3 = series.quantile(.75)

Q1 = series.describe()['25%']
Q3 = series.describe()['75%']

Detection of outliers using quartiles

#Extract only the data whose value is out of order in column A
IQR = Q3 - Q1
threshold = Q3 + 1.5 * IQR

df_outlier = df[df['A'].apply(lambda x:x > threshold)]

On the contrary, if you want the data that fits, you can take a logical negation like df [~ df ...] It's good if you change the direction of the inequality sign.

Summary

There seems to be a guy who can get outliers in one shot without doing this ...

Statistics beginners are tinkering with data using pandas. I would be grateful if you could tell me if there is any good way.

reference

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html

Refer to this document for quartiles http://www.contents-station.net/gacco/Data_Analysis_Innovation/Week03/3-4.pdf