I'm working on an algorithm for automatic bidding for ads (it's going to be in the past soon), but There is no log stored at all. Therefore, I had no choice but to generate data and perform a simulation.
What is multipurpose optimization? Book There is a description as follows.
In multi-objective optimization, consider minimizing multiple objective functions $ \ bf {f} $. Ideally, you might want to be able to minimize all objective functions $ f_1 $, ..., $ f_k $ at the same time. However, that means that it was not necessary to solve it as a multi-objective optimization in the first place. This is because if one objective function is minimized, the other objective functions are minimized at the same time. Therefore, the problem that should usually be dealt with in multi-objective optimization is that there is a trade-off relationship between the objective functions $ \ bf {f} $, such that trying to make one objective function smaller makes the other objective function larger. There is a case. (Quoted from Mathematical Optimization Practice Guide p.110)
In other words, let's minimize (maximize) the value of the objective function in the situation where "if you set it up, you can't stand up here". Basically, you will find a Pareto solution. There are various methods, and the method differs depending on whether the Pareto front is convex or non-convex. 50 optimization problems useful for application (application optimization series) = 1211 & creativeASIN = 4254117884 & linkCode = as2 & tag = shimashimao06-22 "> 50 optimization problems useful for application (application optimization series) ) Will be helpful. However, this time I chose a heuristic method because I wanted to perform calculations at high speed and I wanted to optimize for a fairly fine time slot.
References are here. Real time bid optimization with smooth budget delivery in online advertising, Lee+, Turn Inc.,'13 [PDF] Please refer to slidshare for the contents. .. (However, if you look at this now, the interpretation part is a little wrong ... I will correct it if you feel like it w)
The code is created based on this paper. However, the parts that take time to generate data are omitted. As I wrote at the beginning, since there is no data, I set up a user agent and a bid agent to make a pseudo bid. The time slot is set to 1 minute, and the budget is distributed and the bid amount is optimized for each time slot.
The code I created is shown below. Since it is written by a non-engineer, I think there are many things to do, but please be patient.
Create a site for a user agent to visit the site.
python
# coding: utf-8
import random
class Agent:
def __init__(self,num):
random.seed(num) #Occurrence of random seed
self.ctr = random.uniform(0,1)
def get_ctr(self):
return self.ctr
def get_visiter_ctr(self, pred):
if random.uniform(0,1) > (1-pred):
return self.get_ctr()
else: return False
Next, create an agent (competitor) to bid.
python
# coding: utf-8
import random
from math import ceil
class Agent:
def __init__(self,n):
random.seed(n) #Occurrence of random seed
self.ctr = random.uniform(0,1)
self.maxprice = random.randint(40,90)
def get_maxprice(self):
return self.maxprice
def get_bid_price(self, ctr, pred):
price = self.maxprice*ctr*pred
if random.uniform(0,1) > 0.35 :
return ceil(price) + random.randint(0,5)
else:
return False
How to write the code of the optimization part.
python
# coding:utf-8
#cython: boundscheck=False
#cython: wraparound=False
import pyximport; pyximport.install()
from math import sin, cos, pi, fmod, sqrt, floor
import random
import useragent
import bidagent
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.float64
INTTYPE = np.int64
ctypedef np.float64_t DTYPE_t
ctypedef np.int64_t INTTYPE_t
cdef extern from 'stdlib.h':
ctypedef unsigned long size_t
void *malloc(size_t size)
void free(void *prt)
class CreatePredict:
pi2 = 2*pi
#random.seed(3)
PHI = 2*pi/8 #Representing phase shift
# TBit = 10000000.0
def __init__(self, period, agent_num):
self.minbit = 1000*agent_num/1000
self.E = 1250*agent_num/1000
self.PERIOD = period
self.OMEGA = self.pi2/self.PERIOD
self.AGENT_NUM = agent_num
def getvalue(self,period):
"""
Probability of coming to the site
sin(theta) = sin(OMEGA*t + phi)
T = 2*pai/OMEGA
60*60*24=86400sec
OMEGA = 2pai/86400
"""
cdef sec
cdef np.ndarray[np.double_t, ndim=1] tmp = np.zeros(period,dtype=np.double)
for sec in xrange(period):
tmp[sec] = self.E * ( sin(self.OMEGA*sec - self.PHI) + 1 ) + self.minbit
return tmp/self.AGENT_NUM #Calculation of pred
cdef class ComputeBid:
cdef int time, day
gamma = 1.96 #Used to calculate confidence intervals(95%Assuming confidence interval)
beta1=0.3
beta2=0.8
theta_bin = 100
cdef np.ndarray \
bid_num,win_num,cmsp_bag,theta,ctr_hist,mu,sigma
cdef int period, daybuget, gcpc, bidcap, bin_n
cdef double ctrbin, avgprice
def __init__(self,period,daybuget,gcpc,bidcap,ctrbin):
self.period = period
self.ctrbin = ctrbin
self.daybuget = daybuget
self.gcpc = gcpc
self.bidcap = bidcap
self.bid_num = np.zeros(period, dtype=np.int64)
self.theta = np.zeros(self.theta_bin, dtype=np.float64)
#Storage of ctr hist grams
self.bin_n = int( float(1.0)/ctrbin )
self.ctr_hist = np.zeros((period,self.bin_n), dtype=np.int64)
self.mu = np.zeros(period, dtype=np.float64) #Average value for each slot of tau of bid threshold
self.sigma = np.zeros(period, dtype=np.float64) #Variance of bid threshold tau for each slot
self.avgprice = 0.0
def __dealloc__(self):
pass
def output(self, file,user_agent_num,bid_agent_num,days):
cdef int t,n,i,c,b
#Calculation of visit prediction
pred_creater = CreatePredict(self.period, user_agent_num)
#Occurrence of user agent
user_agent = [ useragent.Agent(n) for n in xrange(user_agent_num) ]
#Occurrence of bid agent
bid_agent = [ bidagent.Agent(n) for n in xrange(bid_agent_num) ]
for t in xrange(self.period*days):
self.time = fmod(t,self.period)
if fmod(t,self.period)==0:
if self.day!=0: self.avgprice /= sum(self.win_num)
self.bid_num = np.zeros(self.period, dtype=np.int64)
self.win_num = np.zeros(self.period, dtype=np.int64)
self.cmsp_bag = np.zeros(self.period, dtype=np.float64)
self.day+=1
print "progress:",t
#Calculation of visit prediction
pred = pred_creater.getvalue(self.period)
#Calculation of budget allocation for each slot
slot_bag = self.calc_slot_bag(pred)
#Get visitor ctr
ctr = []
bidp = []
visit = 0.0
dbidprice = {}
for i in xrange(user_agent_num):
ctr.append( user_agent[i].get_visiter_ctr(pred[self.time]) )
#Calculation of bid amount for each agent
if ctr[i] == False: continue
visit += 1.0
"""Create a histogram of ctr"""
if self.day==1: self.get_hist(ctr[i])
maxprice = 0.0
for b in xrange(bid_agent_num):
price = bid_agent[b].get_bid_price(ctr[i], pred[self.time])
if price > maxprice:
maxprice = price
bidp.append(maxprice)
"""Counting the number of bids and determining the bid amount"""
#Pacing used for bidding pricing_Calculate rate
if self.bid_num[self.time]==0:
pacing_rate = 0.0
else:
pacing_rate = self.bid_num[self.time]/float(visit)
#Bid price determination
bidprice = self.calc_bidprice(self.day, ctr[i], pacing_rate)
print 'price',bidprice
#Judgment whether to bid or not
if bidprice == False: continue
"""Creating a dictionary for confirming the calculated bid amount"""
if str(floor(bidprice)) not in dbidprice: dbidprice[str(floor(bidprice))] = 1
else: dbidprice[str(floor(bidprice))] += 1
#If the bid amount exceeds cap, it will be the value of cap.
if bidprice>self.bidcap: bidprice = self.bidcap
#If you want to bid, check the number of bids
self.bid_num[self.time] += 1
#Judgment of winning bid
if bidprice > maxprice:
#Judgment of whether to bid
if slot_bag[self.time] - self.cmsp_bag[self.time] - bidprice > 0:
self.cmsp_bag[self.time] += bidprice
print 'cmsp=',self.cmsp_bag[self.time]
self.calc_theta(bidprice,maxprice)
self.win_num[self.time] += 1
#Calculation of average cost around 1bid
if self.day==1: self.avgprice += bidprice
print dbidprice
print np.float64(self.win_num)/self.bid_num
num1 = []
num2 = []
for i in xrange(self.period):
num1.append( str(slot_bag[i]) )
num2.append( str(self.cmsp_bag[i]) )
file.write(u'slot bags\n'+u','.join(num1)+u'\n')
file.write(u'comsumption bags\n'+u','.join(num2)+u'\n')
def get_hist(self,ctr):
"""Create a histogram of ctr"""
cdef int bin
for bin in xrange(self.bin_n):
if bin*self.ctrbin<=ctr<(bin+1)*self.ctrbin:
self.ctr_hist[self.time,bin] += 1
break
def calc_bidprice(self, day, ctr, pacing_rate): #ctr[i]
cdef int i
tau = 0.0
if day<2:
if ctr>=0.2:
return random.randint(20,80)
else: return False
else:
if self.cmsp_bag[self.time]!=0.0:
#Prediction imp calculation
imps = float(self.cmsp_bag[self.time])/self.avgprice
print 'imps',imps
#Calculation of predicted number of bids
#win_rate = bid_num[self.time]/win_num[self.time]
bids = float(imps*self.bid_num[self.time])/self.win_num[self.time]
print 'bids',bids
#Calculation of the number of predicted requests
print 'pacing_rate',pacing_rate
reqs = bids/pacing_rate
else:
reqs=0.0
print 'reqs',reqs
#Get bid threshold
for i in xrange(len(self.ctr_hist[self.time,:])):
if sum(self.ctr_hist[self.time,-(i+1):]) - reqs >0.0:
tau = float((self.theta_bin-1)-i)/self.theta_bin
break
#Calculation of mean value
if self.time == 0: t = self.period - 1
else: t = self.time-1 #T here is t-Set to 1
div_t = float(1)/(self.time+1)
self.mu[self.time] = self.mu[t]+ div_t*(tau-self.mu[t]) #mu may need to save the current slot and the past slot
#Variance calculation
self.sigma[self.time] = div_t*t*self.sigma[t] \
+ div_t*(tau-self.mu[t])*(tau-self.mu[self.time])
#Provisional pricing
u = ctr*self.gcpc # Ui = AR*G(Predictive CTR x Target CPC)
#Determine the price according to the bid rate
# beta1=0.3, beta2=0.Set to 8
price = self.get_price(pacing_rate,u)
print 'price1',price
#Judgment of whether to bid
return self.judge_bid(pacing_rate,price,tau)
def calc_theta(self,bidprice,secprice):
cdef int b
theta = float(secprice)/bidprice
for b in xrange(100):
if 0.01*b<=theta<=0.01*(b+1):
self.theta[b] += 1
break
def calc_slot_bag(self,pred):
s = sum(pred)
prop = np.array(pred, dtype=DTYPE)/s
return prop*self.daybuget
def get_price(self,pacing_rate,u):
price = 0.0
if pacing_rate<=self.beta1:
s_theta = sum(self.theta)
s = 0.0
for i in xrange(len(self.theta)):
s += self.theta[i]
if float(s)/s_theta>0.01:
return i*(float(3.0)/(2*self.theta_bin))*u #price = self.theta[i]*u
elif pacing_rate>=self.beta2: # pacing_You may need to make rage predictions
lo =1.0 + (self.bidcap/self.avgprice-1.0)/(1.0-self.beta2) \
*(pacing_rate-self.beta2)
return lo*u
else:
return u
def judge_bid(self, pacing_rate, price, tau):
term2 = self.gamma*self.sigma[self.time]/sqrt(self.day)
print 'tau',tau
if tau<self.mu[self.time]-term2: #Bid rejection limit
return False
elif tau>=self.mu[self.time]+term2: #Lower bid acceptance limit
return price
else:
if pacing_rate>random.uniform(0,1):
return price
else: return False
Since compilation is required when using Cython, setup.py is also described for the time being.
setup.py
python
# coding: utf-8
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy as np
setup(
name="CPWOPTcritical",
cmdclass={"build_ext":build_ext},
ext_modules=[Extension("CySimulation",["CySimulation.pyx"])],
include_dirs=[np.get_include()]#Specify the include path here
)
Also, because it is Cython, the main part is separated. It is described below.
main
python
# coding: utf-8
#import pyximport; pyximport.install()
import io
if __name__=='__main__':
import CySimulation
period = 1440 #Number of slots per day
daybuget = 1000000 #Daily budget
goalcpc = 100 #Goal cpc
bidcap = 90 #Bid cap
ctrbin = 0.01 #CTR histogram bin size
days = 2 #Data generation days
user_agent_num = 600 #Number of user agents
bid_agent_num = 100 #Number of bid agents
bid = CySimulation.ComputeBid(period, daybuget, goalcpc, bidcap,ctrbin)
with io.open('sin_test','a') as f:
bid.output(f,user_agent_num,bid_agent_num,days)
It seems that it will go well, but I feel that the value that I expected in the part where the data generation was omitted or the part predicted from the past data was skipped did not come out. Since the settings are quite detailed, it seems difficult to get sufficient results unless you create them precisely.
It's a fairly heuristic method, but I think it's the most usable method (paper) under conditions where a speed of 50ms or less is required in combination with CVR estimation. For this job, the first paper I read about bid optimization was the one introduced above. I have read several other papers, but the impact of this paper was the strongest. If logs other than IP and time stamp are stored, I wanted to optimize with actual data based on this method. I'm sorry.
If you find something strange, we would appreciate it if you could point it out.