Introduction

Uplift modeling is a method for maximizing profits through marketing measures. Marketing efficiency can be improved by targeting customers with higher profits and taking measures such as sending emails and delivering advertisements.

This article provides an overview of Uplift modeling and a reminder of an example implementation to achieve it.

Uplift modeling

Overview

** Uplift modeling is a method of predicting the increase in profit (the difference between the profit when a measure is implemented and the profit when the measure is not implemented) for a customer with a certain attribute. ** A normal machine learning prediction task simply predicts how to react to a measure (whether or not to make a reservation, how much is the reservation amount, etc.), but Uplift modeling is for the measure. Predict how the reaction will change (how the booking probability and booking amount will change). By clarifying how customers' reactions to measures change, it is possible to target customers who take measures only for highly effective customers and personalize them to implement marketing measures suitable for each customer. What is predicted by Uplift modeling can be rephrased as the increase in profits due to the implementation of measures when the attributes of customers are conditioned (that is, customers with homogeneous attributes), so CATE (Conditional Average Treatment Effect), ITE. Also called (Indivisual Treatment Effect).

Customer classification in Uplift modeling

In the framework of Uplift modeling, customers are classified into four segments according to how they react to intervention (implementation of measures) (conversion (CV) such as clicks and reservations).

With intervention	No intervention	Segment name	Description
CV	CV	Iron plate	CV with or without intervention
CV	Do not CV	Persuasive	CV for the first time by intervening
Do not CV	CV	Amanojaku	If you intervene, you will not have CV
Do not CV	Do not CV	Indifference	No CV with or without intervention

** Teppan ** is a customer base that CVs with or without intervention. If the intervention is costly, it is costly and should not be intervened. For example, if an active user who has the latest purchase history falls into this layer and purchases without inviting purchase by e-mail delivery, the cost of e-mail delivery will be wasted. ** Persuasive ** is a customer base that does not CV without intervention, but only with intervention. For example, users who are willing to buy but cannot make a purchase fall into this category, and by distributing discount coupons, etc., users belonging to this layer are more likely to purchase. ** Amanojaku ** is a customer base that does CV without intervention, but rather does not CV with intervention. Intervention reduces profits, so it is a customer base that should never be intervened. ** Indifference ** is a customer base that does not CV with or without intervention. For example, if a user who has been estranged or dormant for a long time since the last purchase falls into this group and cannot expect to purchase even if the mail is delivered, the cost can be reduced by stopping the mail delivery. I will.

Uplift modeling realizes efficient marketing by identifying the customer base that belongs to persuasiveness and implementing measures for these customers.

Main algorithms for Uplift modeling

Uplift modeling has two main algorithms. One is called the Meta-Learner algorithm, which builds a model (called a base-learner) that predicts the profit when intervening and the profit when not intervening, and estimates the increase in profit. The other is called Uplift Tree, which is an algorithm that builds a tree that divides a group of customers based on the criteria of "Will the increase in profit increase?"

Meta-Learner algorithm

The Meta-Learner algorithm is further categorized into algorithms such as T-Learner, S-Learner, X-Learner, and R-Learner, depending on how you predict the benefits of intervention and non-intervention.

T-Learner It is an algorithm that predicts the increase in profit by constructing a model that predicts the profit without intervention and a model that predicts the profit without intervention separately and taking the difference between the predicted values. Since the model is constructed separately for the case of intervention and the case of no intervention, the presence or absence of intervention is not adopted as a feature quantity.

When written in a formula,

\mu_1(x) = E(Y_1 | X=x)

\mu_0(x) = E(Y_0 | X=x)

\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)

is. Customer background information $ x $ (demographic attributes, purchase history, etc.) as a feature, a model that predicts the profit (reservation probability, reservation amount, etc.) $ Y_1 $ when intervening, and the profit $ Y_0 when not intervening Build a model that predicts $, and the difference between those predicted values $ \ hat {\ mu} _1 (x), \ hat {\ mu} _0 (x) $ is the increase in profit $ \ hat {\ tau } (x) $.

S-Learner It is an algorithm that predicts the increase in profit by building one model that predicts profit, predicting profit with and without intervention, and taking the difference between them. Unlike T-Learner, the presence or absence of intervention is used as a feature.

When written in a formula,

\mu(x, z) = E(Y | X=x, Z=z)

\hat{\tau}(x) = \hat{\mu}(x, Z=1) - \hat{\mu}(x, Z=0)

is. A model that predicts profit $ Y $ using customer background information $ x $ and a group variable $ z $ ($ z = 1 $ represents intervention and $ z = 0 $) indicating the presence or absence of intervention. The difference between the predicted value $ \ hat {\ mu} (x, Z = 1) $ when constructing and intervening and the predicted value $ \ hat {\ mu} (x, Z = 0) $ when not intervening , The increase in profit will be $ \ hat {\ tau} (x) $.

X-Learner It is an algorithm that predicts the increase in profit by estimating the pseudo-effects (pseudo-profit increase) with and without intervention, weighting them, and adding them together.

X-Learner first builds a model that predicts the profit $ Y_1 $ when intervening and a model that predicts the profit $ Y_0 $ without intervention, using the customer background information $ x $ as a feature.

\mu_1(x) = E(Y_1 | X=x)

\mu_0(x) = E(Y_0 | X=x)

Then, the pseudo-effects with and without intervention are estimated by the following formula.

D_i^1 = Y_i^1 - \hat{\mu}_0(x_i^1)

D_i^0 = \hat{\mu}_1(x_i^0) - Y_i^0

The superscript indicates that 1 uses data associated with the intervening customer and 0 indicates data associated with the non-intervening customer. $ D_i ^ 1 $ is the profit of $ Y_i ^ 1 $ obtained from the intervening customer and the profit of $ \ hat {\ mu} \ _0 (x_i ^ 1) $ if the intervening customer is not intervened. It is the difference and represents the increase in pseudo-profit in the intervening customer group. $ D_i ^ 0 $ is the profit $ \ hat {\ mu} _1 (x \ _ i ^ 0) $ if the customer who did not intervene is intervened, and the profit $ Y_i ^ obtained from the customer who did not intervene. It is a difference of 0 $ and represents a pseudo-profit increase for non-intervening customers.

In addition, build a model that predicts pseudo-effects $ D_1, D_2 $ from the customer background information $ x $.

\tau_1(x) = E(D_1 | X=x)

\tau_0(x) = E(D_0 | X=x)

Finally, weigh the predicted value $ \ hat {\ tau} _0 (x) $ with intervention and the predicted value $ \ hat {\ tau} _1 (x) $ without intervention with $ g (x) $. It is added and averaged to predict the increase in profit $ \ hat {\ tau} (x) $. The range of $ g (x) $ is $ g (x) \ in \ [0,1] $, and the propensity score $ e (x) = P (Z = 1 | X = x) for $ g (x) $. You can also use) $.

\hat{\tau}(x) = g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x)

R-Learner R-Learner predicts the average profit $ m (x) $ obtained from each customer and the propensity score (= probability that the customer will intervene) $ e (x) $, and the prediction error of the increase in profit. An algorithm that minimizes.

R-Learner first uses the customer background information $ x $ as a feature and builds a model that predicts the average profit $ m (x) $ and the propensity score $ e (x) $.

m(x) = E(Y | X=x)

e(x) = P(Z=1 | X=x)

Then, find the profit increase $ \ tau (・) $ so that the prediction error (loss function) $ \ hat {L} \ _n (\ tau (・)) $ of the profit increase is minimized. ..

\hat{\tau}(・) = argmin_{\tau}\{\hat{L}_n(\tau(・)) + \Lambda_n(\tau(・))\}

\hat{L_n(\tau(・))} = \frac{1}{n}\sum_{i=1}^n((Y_i-\hat{m}^{(-i)}(x_i))-(z_i-\hat{e}^{(-i)}(x_i))\tau(x_i))^2

$ \ Lambda_n (\ tau (・)) $ is a regularization term. $ \ hat {m} ^ {(-i)} (x_i), \ hat {e} ^ {(-i)} (x_i) $ is predicted by a model constructed with data other than the data of customer $ i $ Represents the average profit and propensity score of the customer $ i $.

Uplift Tree Uplift Tree is an algorithm that builds a tree that divides a group of customers based on the criteria "Does the increase in profit increase?" The decision tree used in the binary classification task divides the population by certain attribute conditions so that the purity of the class in the divided population is reduced compared to before the division. On the other hand, in Uplift Tree, the group is divided according to the conditions related to certain attributes so that the distance between the profit distribution of the group with intervention and the profit distribution of the group without intervention in the divided group increases compared to before the division. I will. In other words, we will divide the customer group by attributes that are strongly related to the increase in profit.

When written in a formula,

D_{gain} = D_{after-split}(P^T, P^C) - D_{before-split}(P^T, P^C)

Build the tree so that $ D_ {gain} $ defined in is large. $ P ^ T and P ^ C $ represent the distribution of profits in the intervention group and the non-intervention group, respectively, and $ D $ represents the distance of the distribution. The amount of Kullback-Leibler information and the Euclidean distance are used as $ D $.

Evaluation index of Uplift modeling

Uplift modeling performance is evaluated using an index called ** AUUC (Area Under the Uplift Curve) **. AUUC is a normalized measure of how much profit will increase if you only intervene in customers that have a large increase in profit predicted by Uplift moeling compared to intervening in randomly selected customers. The higher the AUUC value, the higher the performance of Uplift modeling. The procedure for calculating AUUC is as follows.

Score each customer so that the greater the increase in profit predicted by Uplift modeling, the greater the value. Here, this score is called the uplift score. (If you are using a ratio such as CVR as a profit, you can use the increase in CVR as it is as an uplift score.)
Calculate how much profit increased when you intervened only in customers whose uplift score was above the threshold (this value is called lift here) compared to when you did not intervene.
Randomly select the same number of customers as the customers who intervene in 2. and calculate how much the lift obtained in 2. will increase by comparing with the increase in profit when intervening.
Add the lifts calculated repeatedly by changing the thresholds of 2. and 3., and finally normalize. (The curve that plots the lift obtained by changing the threshold value is called the uplift curve.)

When written in a formula,

AUUC = \sum_{k=1}^n AUUC_{\pi}(k)

AUUC_{\pi}(k) =  AUL_{\pi}^T(k) - AUL_{\pi}^C(k) = \sum_{i=1}^k (R_{\pi}^T(i) - R_{\pi}^C(i)) - \frac{k}{2}(\bar{R}^T(k) - \bar{R}^C(k))

is. $ n $ represents the total number of customers, $ k $ represents the number of customers whose uplift score is above the threshold, and $ \ pi $ represents the order of customers (in descending order of uplift socre). $ AUL_ {\ pi} ^ T (k) $ is lift when intervening up to $ k $ customers according to the order $ \ pi $, $ AUL_ {\ pi} ^ C (k) $ is random The increase in profits when selecting and intervening $ k $ customers. $ R_ {\ pi} ^ T (i) $ is the profit of intervening in the $ i $ th customer in the order $ \ pi $, $ R_ {\ pi} ^ C (i)) $ is the order $ The profit if you do not intervene in the $ i $ th customer at \ pi $. $ \ bar {R} ^ T (k) $ is the benefit of intervening in a randomly selected $ k $ person, $ \ bar {R} ^ C (k) $ is a randomly selected $ k $ It is the benefit of not intervening in people. $ AUL_ {\ pi} ^ C (k) $ is seen as the area of a triangle with base $ k $ and height $ \ bar {R} ^ T (k)-\ bar {R} ^ C (k) $ You can also think of it.

Implementation example

Implement code in Python that predicts the increase in profit by Uplift modeling. Here, T-Learner and S-Learner are implemented. The implementation here is based on Chapter 9 of Machine Learning Starting at Work. The execution environment is Python 3.7.6, numpy 1.18.1, pandas 1.0.2, scikit-learn 0.22.2. Below is the code.

--Loading the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')
import random

from sklearn.linear_model import LogisticRegression

--Generation of data to be used

The profit here is CVR.

def generate_sample_data(num, seed=0):
    cv_flg_list = [] #A list of flags that indicate conversions
    treat_flg_list = [] #A list of flags that indicate whether you have intervened
    feature_vector_list = [] #List of features
    
    feature_num = 8 #Number of features
    base_weight = [0.02, 0.03, 0.05, -0.04, 0.00, 0.00, 0.00, 0.00] #Feature base
    lift_weight = [0.00, 0.00, 0.00, 0.05, -0.05, 0.00, 0.0, 0.00] #Amount of change in features during intervention
    
    random_instance = random.Random(seed)
    for i in range(num):
        feature_vector = [random_instance.random() for n in range(feature_num)] #Randomly generate features
        treat_flg = random_instance.choice((1, 0)) #Randomly generate intervention flags
        cv_rate = sum([feature_vector[n]*base_weight[n] for n in range(feature_num)]) #Generate the base value of CVR
        if treat_flg == 1:
            cv_rate += sum([feature_vector[n]*lift_weight[n] for n in range(feature_num)]) #Lift if you want to intervene_Add weight and add CVR
        cv_flg = 1 if cv_rate > random_instance.random() else 0
        
        cv_flg_list.append(cv_flg)
        treat_flg_list.append(treat_flg)
        feature_vector_list.append(feature_vector)
    
    df = pd.DataFrame(np.c_[cv_flg_list, treat_flg_list, feature_vector_list], 
                      columns=['cv_flg', 'treat_flg','feature0', 'feature1', 'feature2', 
                               'feature3', 'feature4', 'feature5', 'feature6', 'feature7'])
    
    return df

train_data = generate_sample_data(num=10000, seed=0) #Model building data (learning data)
test_data = generate_sample_data(num=10000, seed=1) #Model performance evaluation data (verification data)

--Implementation of T-Learner

#Prepare data for the model to predict the benefit (CVR) of intervention
X_train_treat = train_data[train_data['treat_flg']==1].drop(['cv_flg', 'treat_flg'], axis=1)
Y_train_treat = train_data.loc[train_data['treat_flg']==1, 'cv_flg']

#Prepare data for the model to predict the benefit (CVR) without intervention
X_train_control = train_data[train_data['treat_flg']==0].drop(['cv_flg', 'treat_flg'], axis=1)
Y_train_control = train_data.loc[train_data['treat_flg']==0, 'cv_flg']

#Build two models
treat_model = LogisticRegression(C=0.01, random_state=0)
control_model = LogisticRegression(C=0.01, random_state=0)
treat_model.fit(X_train_treat, Y_train_treat)
control_model.fit(X_train_control, Y_train_control)

#Predict CVR for validation data
X_test = test_data.drop(['cv_flg', 'treat_flg'], axis=1)

treat_score = treat_model.predict_proba(X_test)[:, 1]
control_score = control_model.predict_proba(X_test)[:, 1]

#Calculate uplift score
uplift_score = treat_score - control_score

--Calculation of AUUC

#Sort the validation data in descending order of uplift score
result = pd.DataFrame(np.c_[test_data['cv_flg'], test_data['treat_flg'], uplift_score], columns=['cv_flg', 'treat_flg', 'uplift_score'])
result = result.sort_values(by='uplift_score', ascending=False).reset_index(drop=True)

#Calculation of lift
result['treat_num_cumsum'] = result['treat_flg'].cumsum()
result['control_num_cumsum'] = (1 - result['treat_flg']).cumsum()
result['treat_cv_cumsum'] = (result['treat_flg'] * result['cv_flg']).cumsum()
result['control_cv_cumsum'] = ((1 - result['treat_flg']) * result['cv_flg']).cumsum()
result['treat_cvr'] = (result['treat_cv_cumsum'] / result['treat_num_cumsum']).fillna(0)
result['control_cvr'] = (result['control_cv_cumsum'] / result['control_num_cumsum']).fillna(0)
result['lift'] = (result['treat_cvr'] - result['control_cvr']) * result['treat_num_cumsum']
result['base_line'] = result.index * result['lift'][len(result.index) - 1] / len(result.index)

#Calculation of AUUC
auuc = (result['lift'] - result['base_line']).sum() / len(result['lift'])
print('AUUC = {:.2f}'.format(auuc))
#output:=> AUUC = 37.70

--Drawing uplift curve

result.plot(y=['lift', 'base_line'])
plt.xlabel('uplift score rank')
plt.ylabel('conversion lift')
plt.show()

--Implementation of S-Learner

#Preparation of learning data (create an interaction term between the presence or absence of intervention and features)
X_train = train_data.drop('cv_flg', axis=1)
for feature in ['feature'+str(i) for i in range(8)]:
    X_train['treat_flg_x_' + feature] = X_train['treat_flg'] * X_train[feature]
Y_train = train_data['cv_flg']

#Build a model
model = LogisticRegression(C=0.01, random_state=0)
model.fit(X_train, Y_train)

#Preparation of verification data when intervening
X_test_treat = test_data.drop('cv_flg', axis=1).copy()
X_test_treat['treat_flg'] = 1
for feature in ['feature'+str(i) for i in range(8)]:
    X_test_treat['treat_flg_x_' + feature] = X_test_treat['treat_flg'] * X_test_treat[feature]

#Preparation of verification data when no intervention is required
X_test_control = test_data.drop('cv_flg', axis=1).copy()
X_test_control['treat_flg'] = 0
for feature in ['feature'+str(i) for i in range(8)]:
    X_test_control['treat_flg_x_' + feature] = X_test_control['treat_flg'] * X_test_control[feature]

#Predict profit CVR for validation data
treat_score = model.predict_proba(X_test_treat)[:, 1]
control_score = model.predict_proba(X_test_control)[:, 1]

#Calculation of uplift score
uplift_score = treat_score - control_score

When AUUC was evaluated in the same way as T-Learner, AUUC = 19.60, and the result of this data is that T-Learner has higher performance.

in conclusion

We have summarized an overview of Uplift modeling and an implementation example to achieve it. If you find any mistakes, we would appreciate it if you could make an edit request.

reference

-Machine learning starting at work -Machine Learning / Economic Models, Best Practices, Architecture for AI Algorithm Marketing Automation

Uplift modeling in Python

Introduction

Overview

Customer classification in Uplift modeling

Main algorithms for Uplift modeling

Meta-Learner algorithm

Evaluation index of Uplift modeling

Implementation example

in conclusion

reference