ntile (decile) function in python

What i did

Even if the ntile function that was often used in SQL during decyl analysis is searched, It didn't come out, so I created a function.

Dedicated to the older brothers who felt "ntile is the best. Why isn't it Python .." in SQL.

ntile function (edited a little)

Well, as a usage, Have df = pandas dataframe inserted, col = Decyl Enter the column name of the target you want to analyze. n = 10 is set, but if you want to make 100 groups, set n = 100 etc. If n_tile = True, return the dataframe given as columns for the number of decades, If set to False, a dataframe with various aggregate functions will be returned. agg = Aggregate function you want to use when n_tile False above, For those who say "just sum", use agg = ["sum"] as an argument.

def ntile(df, col, n=10, n_tile=True, ascending=False, aggs=["sum", "count", "min", "max", "median", "mean", "std"]):
    # check pandas dataframe type
    if type(df) == pd.core.frame.DataFrame:
        # check int or float
        if type(df[f"{col}"].max()) == np.int64 or type(df[f"{col}"].max()) == np.float64:
            ntile_df = pd.DataFrame()
            length = len(df)
            df = df.sort_values(col, ascending=ascending).reset_index(drop=True)
            div_n = math.ceil(len(df) / n)

            # mark N group 
            for idx, num in enumerate(range(0, length, div_n)):
                n_df = df[num: num+div_n]
                n_df["decile"] = idx + 1
                ntile_df = ntile_df.append(n_df)
            
            # return ntile_df if need only decile data
            if n_tile:
                return ntile_df
            
            # otherwise aggregate col
            agg_df = ntile_df.groupby("decile").agg({f"{col}": aggs}).reset_index()
            #  clean columns
            agg_df.columns = ["".join(c) for c in agg_df.columns]
            # merge two data
            ntile_df = ntile_df.merge(agg_df, on="decile").copy()
            
            return ntile_df
        
        raise TypeError(f"`{col}` must be int or float type")
    raise TypeError(f"`{type(df)}` must be pandas.core.frame.DataFrame type")

Create sample data

import pandas as pd
import random
#Create a random price guy
price = random.sample(range(10000, 900000), k=10000)
#Create a dataframe using the above
data = pd.DataFrame(price, columns=["price"])

Contents is like this.

スクリーンショット 2019-11-10 13.14.05.png

Use it like this

#Put data in the function. n=Leave the default of 10
data = ntile(data, "price")
#Group by with decile and apply various aggregate functions.
data.groupby("decile").agg({"price": ["sum", "count", "min", "max", "median", "mean"]}).reset_index()

n=10 スクリーンショット 2019-11-10 13.17.16.png

n=100 スクリーンショット 2019-11-10 13.17.52.png

If n_tile = False

data = ntile(data, "price", n=5 , n_tile=False)

data.pricemin.value_counts().sort_index().reset_index().rename(columns={"index": "price_segment", "pricemin": "count"})

And look at this.

スクリーンショット 2019-11-10 14.52.49.png

Have you ever done something like the following? And "Isn't there a function that creates a class with n% of all classes in the target column?" Have you ever felt that? スクリーンショット 2019-11-10 14.48.44.png

I'm a beginner, so I've thought about the above several times. I looked at ntile every time and corrected it by hand, but from now on, it is possible to change it just by changing the part of n = 10. After all, python has a smaller turn than SQL.

At the end

I don't know if the search is insufficient, but it is strange that there is no ntile function in SQL like python, Well, I tried to make it as a practice, so if you have changed jobs from SQLer to pythoner, please use it.

Recommended Posts

ntile (decile) function in python
Use callback function in Python
Nonlinear function modeling in Python
Draw implicit function in python
Immediate function in python (lie)
python function ①
[Python] function
python function ②
Implement R's power.prop.test function in python
Function argument type definition in python
Included notation in Python function arguments
Write AWS Lambda function in Python
Measure function execution time in Python
Function synthesis and application in Python
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
python enumerate function
Meta-analysis in Python
Precautions when pickling a function in python
Unittest in python
OR the List in Python (zip function)
Python> function> Closure
Epoch in Python
Discord in Python
[Python] Generator function
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Python> function> Inner function
Csv in python
Disassemble in Python
Reflection in Python
Duality in function
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Python function decorator
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python