Uplift modeling is a method for maximizing profits through marketing measures. Marketing efficiency can be improved by targeting customers with higher profits and taking measures such as sending emails and delivering advertisements.
This article provides an overview of Uplift modeling and a reminder of an example implementation to achieve it.
Uplift modeling
** Uplift modeling is a method of predicting the increase in profit (the difference between the profit when a measure is implemented and the profit when the measure is not implemented) for a customer with a certain attribute. ** A normal machine learning prediction task simply predicts how to react to a measure (whether or not to make a reservation, how much is the reservation amount, etc.), but Uplift modeling is for the measure. Predict how the reaction will change (how the booking probability and booking amount will change). By clarifying how customers' reactions to measures change, it is possible to target customers who take measures only for highly effective customers and personalize them to implement marketing measures suitable for each customer. What is predicted by Uplift modeling can be rephrased as the increase in profits due to the implementation of measures when the attributes of customers are conditioned (that is, customers with homogeneous attributes), so CATE (Conditional Average Treatment Effect), ITE. Also called (Indivisual Treatment Effect).
In the framework of Uplift modeling, customers are classified into four segments according to how they react to intervention (implementation of measures) (conversion (CV) such as clicks and reservations).
With intervention | No intervention | Segment name | Description |
---|---|---|---|
CV | CV | Iron plate | CV with or without intervention |
CV | Do not CV | Persuasive | CV for the first time by intervening |
Do not CV | CV | Amanojaku | If you intervene, you will not have CV |
Do not CV | Do not CV | Indifference | No CV with or without intervention |
** Teppan ** is a customer base that CVs with or without intervention. If the intervention is costly, it is costly and should not be intervened. For example, if an active user who has the latest purchase history falls into this layer and purchases without inviting purchase by e-mail delivery, the cost of e-mail delivery will be wasted. ** Persuasive ** is a customer base that does not CV without intervention, but only with intervention. For example, users who are willing to buy but cannot make a purchase fall into this category, and by distributing discount coupons, etc., users belonging to this layer are more likely to purchase. ** Amanojaku ** is a customer base that does CV without intervention, but rather does not CV with intervention. Intervention reduces profits, so it is a customer base that should never be intervened. ** Indifference ** is a customer base that does not CV with or without intervention. For example, if a user who has been estranged or dormant for a long time since the last purchase falls into this group and cannot expect to purchase even if the mail is delivered, the cost can be reduced by stopping the mail delivery. I will.
Uplift modeling realizes efficient marketing by identifying the customer base that belongs to persuasiveness and implementing measures for these customers.
Uplift modeling has two main algorithms. One is called the Meta-Learner algorithm, which builds a model (called a base-learner) that predicts the profit when intervening and the profit when not intervening, and estimates the increase in profit. The other is called Uplift Tree, which is an algorithm that builds a tree that divides a group of customers based on the criteria of "Will the increase in profit increase?"
The Meta-Learner algorithm is further categorized into algorithms such as T-Learner, S-Learner, X-Learner, and R-Learner, depending on how you predict the benefits of intervention and non-intervention.
T-Learner It is an algorithm that predicts the increase in profit by constructing a model that predicts the profit without intervention and a model that predicts the profit without intervention separately and taking the difference between the predicted values. Since the model is constructed separately for the case of intervention and the case of no intervention, the presence or absence of intervention is not adopted as a feature quantity.
When written in a formula,
\mu_1(x) = E(Y_1 | X=x)
\mu_0(x) = E(Y_0 | X=x)
\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)
is. Customer background information $ x $ (demographic attributes, purchase history, etc.) as a feature, a model that predicts the profit (reservation probability, reservation amount, etc.) $ Y_1 $ when intervening, and the profit $ Y_0 when not intervening Build a model that predicts $, and the difference between those predicted values $ \ hat {\ mu} _1 (x), \ hat {\ mu} _0 (x) $ is the increase in profit $ \ hat {\ tau } (x) $.
S-Learner It is an algorithm that predicts the increase in profit by building one model that predicts profit, predicting profit with and without intervention, and taking the difference between them. Unlike T-Learner, the presence or absence of intervention is used as a feature.
When written in a formula,
\mu(x, z) = E(Y | X=x, Z=z)
\hat{\tau}(x) = \hat{\mu}(x, Z=1) - \hat{\mu}(x, Z=0)
is. A model that predicts profit $ Y $ using customer background information $ x $ and a group variable $ z $ ($ z = 1 $ represents intervention and $ z = 0 $) indicating the presence or absence of intervention. The difference between the predicted value $ \ hat {\ mu} (x, Z = 1) $ when constructing and intervening and the predicted value $ \ hat {\ mu} (x, Z = 0) $ when not intervening , The increase in profit will be $ \ hat {\ tau} (x) $.
X-Learner It is an algorithm that predicts the increase in profit by estimating the pseudo-effects (pseudo-profit increase) with and without intervention, weighting them, and adding them together.
X-Learner first builds a model that predicts the profit $ Y_1 $ when intervening and a model that predicts the profit $ Y_0 $ without intervention, using the customer background information $ x $ as a feature.
\mu_1(x) = E(Y_1 | X=x)
\mu_0(x) = E(Y_0 | X=x)
Then, the pseudo-effects with and without intervention are estimated by the following formula.
D_i^1 = Y_i^1 - \hat{\mu}_0(x_i^1)
D_i^0 = \hat{\mu}_1(x_i^0) - Y_i^0
The superscript indicates that 1 uses data associated with the intervening customer and 0 indicates data associated with the non-intervening customer. $ D_i ^ 1 $ is the profit of $ Y_i ^ 1 $ obtained from the intervening customer and the profit of $ \ hat {\ mu} \ _0 (x_i ^ 1) $ if the intervening customer is not intervened. It is the difference and represents the increase in pseudo-profit in the intervening customer group. $ D_i ^ 0 $ is the profit $ \ hat {\ mu} _1 (x \ _ i ^ 0) $ if the customer who did not intervene is intervened, and the profit $ Y_i ^ obtained from the customer who did not intervene. It is a difference of 0 $ and represents a pseudo-profit increase for non-intervening customers.
In addition, build a model that predicts pseudo-effects $ D_1, D_2 $ from the customer background information $ x $.
\tau_1(x) = E(D_1 | X=x)
\tau_0(x) = E(D_0 | X=x)
Finally, weigh the predicted value $ \ hat {\ tau} _0 (x) $ with intervention and the predicted value $ \ hat {\ tau} _1 (x) $ without intervention with $ g (x) $. It is added and averaged to predict the increase in profit $ \ hat {\ tau} (x) $. The range of $ g (x) $ is $ g (x) \ in \ [0,1] $, and the propensity score $ e (x) = P (Z = 1 | X = x) for $ g (x) $. You can also use) $.
\hat{\tau}(x) = g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x)
R-Learner R-Learner predicts the average profit $ m (x) $ obtained from each customer and the propensity score (= probability that the customer will intervene) $ e (x) $, and the prediction error of the increase in profit. An algorithm that minimizes.
R-Learner first uses the customer background information $ x $ as a feature and builds a model that predicts the average profit $ m (x) $ and the propensity score $ e (x) $.
m(x) = E(Y | X=x)
e(x) = P(Z=1 | X=x)
Then, find the profit increase $ \ tau (・) $ so that the prediction error (loss function) $ \ hat {L} \ _n (\ tau (・)) $ of the profit increase is minimized. ..
\hat{\tau}(・) = argmin_{\tau}\{\hat{L}_n(\tau(・)) + \Lambda_n(\tau(・))\}
\hat{L_n(\tau(・))} = \frac{1}{n}\sum_{i=1}^n((Y_i-\hat{m}^{(-i)}(x_i))-(z_i-\hat{e}^{(-i)}(x_i))\tau(x_i))^2
$ \ Lambda_n (\ tau (・)) $ is a regularization term. $ \ hat {m} ^ {(-i)} (x_i), \ hat {e} ^ {(-i)} (x_i) $ is predicted by a model constructed with data other than the data of customer $ i $ Represents the average profit and propensity score of the customer $ i $.
Uplift Tree Uplift Tree is an algorithm that builds a tree that divides a group of customers based on the criteria "Does the increase in profit increase?" The decision tree used in the binary classification task divides the population by certain attribute conditions so that the purity of the class in the divided population is reduced compared to before the division. On the other hand, in Uplift Tree, the group is divided according to the conditions related to certain attributes so that the distance between the profit distribution of the group with intervention and the profit distribution of the group without intervention in the divided group increases compared to before the division. I will. In other words, we will divide the customer group by attributes that are strongly related to the increase in profit.
When written in a formula,
D_{gain} = D_{after-split}(P^T, P^C) - D_{before-split}(P^T, P^C)
Build the tree so that $ D_ {gain} $ defined in is large. $ P ^ T and P ^ C $ represent the distribution of profits in the intervention group and the non-intervention group, respectively, and $ D $ represents the distance of the distribution. The amount of Kullback-Leibler information and the Euclidean distance are used as $ D $.
Uplift modeling performance is evaluated using an index called ** AUUC (Area Under the Uplift Curve) **. AUUC is a normalized measure of how much profit will increase if you only intervene in customers that have a large increase in profit predicted by Uplift moeling compared to intervening in randomly selected customers. The higher the AUUC value, the higher the performance of Uplift modeling. The procedure for calculating AUUC is as follows.
When written in a formula,
AUUC = \sum_{k=1}^n AUUC_{\pi}(k)
AUUC_{\pi}(k) = AUL_{\pi}^T(k) - AUL_{\pi}^C(k) = \sum_{i=1}^k (R_{\pi}^T(i) - R_{\pi}^C(i)) - \frac{k}{2}(\bar{R}^T(k) - \bar{R}^C(k))
is. $ n $ represents the total number of customers, $ k $ represents the number of customers whose uplift score is above the threshold, and $ \ pi $ represents the order of customers (in descending order of uplift socre). $ AUL_ {\ pi} ^ T (k) $ is lift when intervening up to $ k $ customers according to the order $ \ pi $, $ AUL_ {\ pi} ^ C (k) $ is random The increase in profits when selecting and intervening $ k $ customers. $ R_ {\ pi} ^ T (i) $ is the profit of intervening in the $ i $ th customer in the order $ \ pi $, $ R_ {\ pi} ^ C (i)) $ is the order $ The profit if you do not intervene in the $ i $ th customer at \ pi $. $ \ bar {R} ^ T (k) $ is the benefit of intervening in a randomly selected $ k $ person, $ \ bar {R} ^ C (k) $ is a randomly selected $ k $ It is the benefit of not intervening in people. $ AUL_ {\ pi} ^ C (k) $ is seen as the area of a triangle with base $ k $ and height $ \ bar {R} ^ T (k)-\ bar {R} ^ C (k) $ You can also think of it.
Implement code in Python that predicts the increase in profit by Uplift modeling. Here, T-Learner and S-Learner are implemented. The implementation here is based on Chapter 9 of Machine Learning Starting at Work. The execution environment is Python 3.7.6, numpy 1.18.1, pandas 1.0.2, scikit-learn 0.22.2. Below is the code.
--Loading the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')
import random
from sklearn.linear_model import LogisticRegression
--Generation of data to be used
The profit here is CVR.
def generate_sample_data(num, seed=0):
cv_flg_list = [] #A list of flags that indicate conversions
treat_flg_list = [] #A list of flags that indicate whether you have intervened
feature_vector_list = [] #List of features
feature_num = 8 #Number of features
base_weight = [0.02, 0.03, 0.05, -0.04, 0.00, 0.00, 0.00, 0.00] #Feature base
lift_weight = [0.00, 0.00, 0.00, 0.05, -0.05, 0.00, 0.0, 0.00] #Amount of change in features during intervention
random_instance = random.Random(seed)
for i in range(num):
feature_vector = [random_instance.random() for n in range(feature_num)] #Randomly generate features
treat_flg = random_instance.choice((1, 0)) #Randomly generate intervention flags
cv_rate = sum([feature_vector[n]*base_weight[n] for n in range(feature_num)]) #Generate the base value of CVR
if treat_flg == 1:
cv_rate += sum([feature_vector[n]*lift_weight[n] for n in range(feature_num)]) #Lift if you want to intervene_Add weight and add CVR
cv_flg = 1 if cv_rate > random_instance.random() else 0
cv_flg_list.append(cv_flg)
treat_flg_list.append(treat_flg)
feature_vector_list.append(feature_vector)
df = pd.DataFrame(np.c_[cv_flg_list, treat_flg_list, feature_vector_list],
columns=['cv_flg', 'treat_flg','feature0', 'feature1', 'feature2',
'feature3', 'feature4', 'feature5', 'feature6', 'feature7'])
return df
train_data = generate_sample_data(num=10000, seed=0) #Model building data (learning data)
test_data = generate_sample_data(num=10000, seed=1) #Model performance evaluation data (verification data)
--Implementation of T-Learner
#Prepare data for the model to predict the benefit (CVR) of intervention
X_train_treat = train_data[train_data['treat_flg']==1].drop(['cv_flg', 'treat_flg'], axis=1)
Y_train_treat = train_data.loc[train_data['treat_flg']==1, 'cv_flg']
#Prepare data for the model to predict the benefit (CVR) without intervention
X_train_control = train_data[train_data['treat_flg']==0].drop(['cv_flg', 'treat_flg'], axis=1)
Y_train_control = train_data.loc[train_data['treat_flg']==0, 'cv_flg']
#Build two models
treat_model = LogisticRegression(C=0.01, random_state=0)
control_model = LogisticRegression(C=0.01, random_state=0)
treat_model.fit(X_train_treat, Y_train_treat)
control_model.fit(X_train_control, Y_train_control)
#Predict CVR for validation data
X_test = test_data.drop(['cv_flg', 'treat_flg'], axis=1)
treat_score = treat_model.predict_proba(X_test)[:, 1]
control_score = control_model.predict_proba(X_test)[:, 1]
#Calculate uplift score
uplift_score = treat_score - control_score
--Calculation of AUUC
#Sort the validation data in descending order of uplift score
result = pd.DataFrame(np.c_[test_data['cv_flg'], test_data['treat_flg'], uplift_score], columns=['cv_flg', 'treat_flg', 'uplift_score'])
result = result.sort_values(by='uplift_score', ascending=False).reset_index(drop=True)
#Calculation of lift
result['treat_num_cumsum'] = result['treat_flg'].cumsum()
result['control_num_cumsum'] = (1 - result['treat_flg']).cumsum()
result['treat_cv_cumsum'] = (result['treat_flg'] * result['cv_flg']).cumsum()
result['control_cv_cumsum'] = ((1 - result['treat_flg']) * result['cv_flg']).cumsum()
result['treat_cvr'] = (result['treat_cv_cumsum'] / result['treat_num_cumsum']).fillna(0)
result['control_cvr'] = (result['control_cv_cumsum'] / result['control_num_cumsum']).fillna(0)
result['lift'] = (result['treat_cvr'] - result['control_cvr']) * result['treat_num_cumsum']
result['base_line'] = result.index * result['lift'][len(result.index) - 1] / len(result.index)
#Calculation of AUUC
auuc = (result['lift'] - result['base_line']).sum() / len(result['lift'])
print('AUUC = {:.2f}'.format(auuc))
#output:=> AUUC = 37.70
--Drawing uplift curve
result.plot(y=['lift', 'base_line'])
plt.xlabel('uplift score rank')
plt.ylabel('conversion lift')
plt.show()
--Implementation of S-Learner
#Preparation of learning data (create an interaction term between the presence or absence of intervention and features)
X_train = train_data.drop('cv_flg', axis=1)
for feature in ['feature'+str(i) for i in range(8)]:
X_train['treat_flg_x_' + feature] = X_train['treat_flg'] * X_train[feature]
Y_train = train_data['cv_flg']
#Build a model
model = LogisticRegression(C=0.01, random_state=0)
model.fit(X_train, Y_train)
#Preparation of verification data when intervening
X_test_treat = test_data.drop('cv_flg', axis=1).copy()
X_test_treat['treat_flg'] = 1
for feature in ['feature'+str(i) for i in range(8)]:
X_test_treat['treat_flg_x_' + feature] = X_test_treat['treat_flg'] * X_test_treat[feature]
#Preparation of verification data when no intervention is required
X_test_control = test_data.drop('cv_flg', axis=1).copy()
X_test_control['treat_flg'] = 0
for feature in ['feature'+str(i) for i in range(8)]:
X_test_control['treat_flg_x_' + feature] = X_test_control['treat_flg'] * X_test_control[feature]
#Predict profit CVR for validation data
treat_score = model.predict_proba(X_test_treat)[:, 1]
control_score = model.predict_proba(X_test_control)[:, 1]
#Calculation of uplift score
uplift_score = treat_score - control_score
When AUUC was evaluated in the same way as T-Learner, AUUC = 19.60, and the result of this data is that T-Learner has higher performance.
We have summarized an overview of Uplift modeling and an implementation example to achieve it. If you find any mistakes, we would appreciate it if you could make an edit request.
-Machine learning starting at work -Machine Learning / Economic Models, Best Practices, Architecture for AI Algorithm Marketing Automation
Recommended Posts