[PYTHON] Fill outliers with NaN based on quartiles in Pandas

Pandas are convenient, aren't they? I would like to remove more than 1.5 times the interquartile range of data from a Pandas DataFrame as outliers. Instead of deleting the entire row based on the value in a column, try to detect outliers for each column and fill them with NaN.

Reference: [Find outliers in the interquartile range (IQR) during correlation analysis (Python)-I sell services and buy homes](http://www.ie-kau.net/entry/2016/ 04/14 /% E7% 9B% B8% E9% 96% A2% E5% 88% 86% E6% 9E% 90% E3% 81% AE% E6% 99% 82% E3% 81% AB% E5% 9B % 9B% E5% 88% 86% E4% BD% 8D% E7% AF% 84% E5% 9B% B2% 28IQR% 29% E3% 81% A7% E5% A4% 96% E3% 82% 8C% E5 % 80% A4% E3% 82% 92% E8% A6% 8B% E3% 81% A4% E3% 81% 91% E3% 82% 8B% EF% BC% 88Pyt)

`drop_outlier.py`


def drop_outlier(df):
  for i, col in df.iteritems():
    #Quartile
    q1 = col.describe()['25%']
    q3 = col.describe()['75%']
    iqr = q3 - q1 #Interquartile range

    #Outlier reference point
    outlier_min = q1 - (iqr) * 1.5
    outlier_max = q3 + (iqr) * 1.5

    #Excludes values that are out of range
    col[col < outlier_min] = None
    col[col > outlier_max] = None

If you want to put the data in a machine learning function such as scikit-learn, fill in the deleted data with fillna etc. This way, the outlier data will be replaced with another value, so take that into consideration when using: joy:

df.fillna(method='bfill')

Have a fun Pandas life.