[PYTHON] Recommendation optimization

1. Recommendation

In recommendations, we often create a model by machine learning based on learning data, give a prediction score for user x items, and show the item TOP? With a high prediction score for each user. However, this method has the problem that the prediction score of popular items is high and the prediction score of unpopular items is small, so only popular items are advertised and unpopular items are not advertised. Advertising sites often need to advertise for unpopular items in order to get some CV.

2. Optimization

This can be resolved by limiting the number of users who advertise on an item-by-item basis. For example, you can set the maximum number of users to advertise for popular items and the minimum number of users to advertise for popular items.

3. Example

Let's assume that the number of users is 30 and the number of items is 20. Consider 5 items recommended for each user.

However, only 10 people or less advertise for all items Items 0, 1 and 2 are popular items, so only 3 or less people will advertise Items 18 and 19 are unpopular items, so let's say you want to advertise more than 7 people.

Less than $ scores_ {ui} $: User u, the predicted score for item i.

If the above condition is made into a mathematical formula

variable

$ choices_ {ui} $: Whether to advertise item i to user u (1 or 0)

Objective function

$ \sum_{u} \sum_{i} scores_{ui} * choices_{ui} $ To maximize.

Constraints

  1. \sum_{i} choices_{ui} <= 5 (\forall u)
  2. \sum_{u} choices_{ui} <= 10 (\forall i)
  3. \sum_{u} choices_{ui} <= 3 (i=0,1)
  4. \sum_{u} choices_{ui} >= 9 (i=18,19)

It can be solved by linear programming.

When it is a real number with pulp

USER =30
ITEM = 20
Users = list(range(0,USER))
Items = list(range(0,ITEM))

prob = pulp.LpProblem("test",pulp.LpMaximize)

#Variable declaration
choices = pulp.LpVariable.dicts("Choice",(Users,Items) , 0, 1, pulp.LpInteger)

#Objective function
prob += pulp.lpSum([scores[u][i] * choices[u][i] for u in Users for i in Items ])

#Constraints
#1. $\sum_{i} choice_{ui} <= 5 (\forall u)$
for u in Users:
    prob += pulp.lpSum([choices[u][i] for i in Items]) <=5

#2. $\sum_{u} choice_{ui} <= 10 (\forall i)$
for i in Items:
    prob += pulp.lpSum([choices[u][i] for u in Users]) <= 10

#3. $\sum_{u} choice_{ui} <= 3 (i=0,1)$
for i in [0,1]:
    prob += pulp.lpSum([choices[u][i] for u in Users]) <= 3

#4. $\sum_{u} choice_{ui} >= 9 (i=18,19)$
for i in [18,19]:
    prob += pulp.lpSum([choices[u][i] for u in Users]) >= 9

status = prob.solve()

Can be solved with.

Now scores are as shown below. Items 1 and 2 are popular items, so increase the score, and scores 18 and 19 are unpopular items, so decrease the score.

np.random.seed(10)
scores = np.random.rand(USER, ITEM)
scores[:,0] += 0.3
scores[:,1] += 0.3
scores[:,18] -= 0.3
scores[:,19] -= 0.3
scores = np.clip(scores, 0, 1)

The sample source is https://github.com/tohmae/pulp_sample/blob/master/score_optimize.py It is in.

When optimizing with the above score, if there are no constraints 2,3,4

Item0 Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Item11 Item12 Item13 Item14 Item15 Item16 Item17 Item18 Item19
User0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0
User1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0
User2 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0
User3 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0
User4 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
User5 1 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
User6 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1
User7 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
User8 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0
User9 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
User10 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
User11 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0
User12 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0
User13 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0
User14 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0
User15 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0
User16 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0
User17 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0
User18 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0
User19 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
User20 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0
User21 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0
User22 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0
User23 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0
User24 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0
User25 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 0
User26 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
User27 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0
User28 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0
User29 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0

When constraints 2, 3 and 4 are entered

Item0 Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Item11 Item12 Item13 Item14 Item15 Item16 Item17 Item18 Item19
User0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1
User1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1
User2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0
User3 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0
User4 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0
User5 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
User6 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1
User7 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0
User8 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0
User9 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0
User10 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0
User11 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1
User12 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1 0
User13 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0
User14 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0
User15 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0
User16 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1
User17 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0
User18 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1
User19 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
User20 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0
User21 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1
User22 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0
User23 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0
User24 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0
User25 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 0
User26 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1
User27 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 1 0
User28 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 0
User29 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1

It can be seen that the number of advertisements for items 1 and 2 decreases and the number of advertisements for items 18 and 19 increases when constraints 2, 3 and 4 are added.

I was able to optimize the recommendations using linear programming.

Recommended Posts

Recommendation optimization
Mathematical optimization RPG
Python in optimization
Bid optimization simulation
Recommendation of Poetry
Use combinatorial optimization
Arch Linux Recommendation