[PYTHON] Calculation of time series customer loyalty

Overview

As the title suggests, I implemented customer loyalty with reference to the following books.

However, there may be some strange parts because it is a combination of things made in the past ... (It worked fine in my local environment.) Oh? If there is something that seems to be a surprise, please comment.

code

I will write the code below. Please note that there is a little amount.

First is the code of the file on the main side.

one_to_one_mrk_run.py


# coding: utf8

import time
import sys
import numpy.random as npr
from numpy.oldnumeric.linear_algebra import inverse
import scipy
from scipy import linalg
import random
import scipy.stats.distributions as dis
from scipy.stats import uniform,chi2,invgamma
import scipy.stats as ss
import numpy as np
import math
from scipy.linalg import inv,cholesky,det
import scipy.optimize as so
import one_to_one_mrk as hbm
from multiprocessing import *
from numpy import genfromtxt

#Data import file path definition
inpt_D_filename = 'D_sliced.csv'
inpt_Z_filename = 'Z_sliced.csv'
inpt_payment_filename = 'pymData_sliced.csv'
inpt_C_filename = 'C_data_sliced.csv'
inpt_visit_filename = 'visit_sliced.csv'

###Data input-------------------
#Acquisition of D data (data required to estimate individual parameters using commonality)
D = genfromtxt(inpt_D_filename, delimiter=',')
#Acquisition of C data (daily average consumption amount)
C = genfromtxt(inpt_C_filename, delimiter=',')
#Acquisition of PM data (amount by purchase date to obtain INV)
PM = genfromtxt(inpt_payment_filename, delimiter=',')
#y Data acquisition (cut by visit flag a of normal distribution,Find b)
y = genfromtxt(inpt_visit_filename, delimiter=',')
print "done input"
### ------------------------------

#Random seed generation
random.seed(555)

##--Definition of constants-------
TIMES = len(PM[0,:])
nhh = len(PM[:,0])
SEASONALYTY = 7 #Seasonal cycle
RP = 50000  #Number of samplings in MCMC
keep = 10 #Display data interval
nz = 2 #Number of explanatory variables in the formula for store visit utility(Z_number of i)
nD = len(D[0,:]) #Vector length of D
m = 1 + 1 + nz
nvar = 1 #Number of objects to be calculated by personal attribute (2 for fresh food and other sundries)
limy = 1e20 #Boundary of numerical values to be regarded as missing
k = 1 #Degree of trend component model
zc = k+1+ SEASONALYTY-2 + nz
pr = 4 #Number of processes
##-------------------

###data generation process-----------------------------------
##Logarithm of the number of days since the last visit,Presence / absence of event (dummy variable)------
##With or without discount (dummy variable)
Ztld=np.zeros((zc,TIMES,nhh))
## ----------------------
#Ztld shaping
Ztld[0,:,:] = [[1]*nhh]*TIMES
Ztld[1,:,:] = [[1]*nhh]*TIMES
Ztld[zc-2,:,:] = genfromtxt(inpt_Z_filename, delimiter=',').T #Site visit interval

Ztld[(k-1)+SEASONALYTY,:,:] = hbm.standardization(Ztld[(k-1)+SEASONALYTY,:,:])
Ztld[(k-1)+SEASONALYTY+nz-1,:,:] = hbm.calc_INV(TIMES, PM, C, nhh)
Ztld[(k-1)+SEASONALYTY+nz-1,:,:] = hbm.standardization(Ztld[(k-1)+SEASONALYTY+nz-1,:,:])
print "done data shape"
###------------------------------------------------------------

##Prior distribution parameters-----------------
# '''step1: For latent utility sampling'''
A = 0.01 * np.identity(zc) ##A is B_Reciprocal of 0
b0 = np.zeros((zc,nvar),dtype=np.float)

#step3: For distributed sampling of system noise
mu0 = 0; kaps0 = 25
nu0 = 0.02; s0 = 0.02
Sita_sys0 = np.array([np.float(10)]*m)

#step5: Data frame of regression parameter H of consumer heterogeneity
m0 = np.zeros((nD,nvar),dtype=np.float)
A0 = 0.01 * np.identity(nD)  #In Higuchi book, A

#step6: Data frame of regression parameter V of consumer heterogeneity
f0 = nvar+3
V0 = f0 * np.identity(nvar)
##------------------------------------

##Create a frame of data required for posterior distribution sampling-----------
#step1: Data frame for latent utility sampling
u = np.zeros((nhh,TIMES),dtype=np.float)

# step2:Data frame for calculation of state vector
#For convenience of processing, set in the initial value setting part

#step3: Data frame for distributed sampling of system noise
Sita_sys = np.array([np.float(10)*np.identity(m) for i in xrange(nhh)])    #Since it is 3D, it cannot be converted to matrix, so it is converted to matrix at the time of calculation.

#step4: Data frame for parameter sampling to specify pseudo-household inventory
#When θ has 2 variables, make it matrix instead of vector
Lsita_dlt = np.zeros((nhh),dtype=np.float)
Lsita_lmbd = np.zeros((nhh),dtype=np.float)
Hdlt = np.zeros((nvar,nD),dtype=np.float)
Hlmbd = np.zeros((nvar,nD),dtype=np.float)
Vsita_dlt = 0.01*np.identity(nvar)
Vsita_lmbd = 0.01*np.identity(nvar)
Sita_dlt = np.zeros((nhh),dtype=np.float)
Sita_lmbd = np.zeros((nhh),dtype=np.float)
sigma_dlt = 1.5*np.identity(nvar)
sigma_lmbd = 1.5*np.identity(nvar)
rej_dlt = np.zeros((nhh),dtype=np.float)
rej_lmbd = np.zeros((nhh),dtype=np.float)
##---------------------------------------------------------


##Initial value setting------------------------------
#For step1
Xs = np.zeros((nhh, zc, TIMES),dtype=np.float) #Man,Number of variables,time
sigma = 1.0

##For step2
param = hbm.FGHset(0, 1, SEASONALYTY, 0, nz)
L = 1
R = np.identity(L)
F = np.array(param['MatF'])
G = np.array(param['MatG'])
#A frame for storing the distribution of the system model for each individual
Q0 =np.array(param['MatQ'])

#For step3
mu = 0.
sigs = 1.
##-------------------------------------------
##Specifying the cutting range
at = np.array([[-100 if y[hh,t]==0 else 0 for t in xrange(TIMES)] for hh in xrange(nhh)])
bt = np.array([[0 if y[hh,t]==0 else 100 for t in xrange(TIMES)] for hh in xrange(nhh)])
y = None

##-------------------
udraw = np.zeros((nhh,1,RP), dtype=np.float)

def calculate_utility((hh, Xs, Ztld)):
    # step1--------------------------------------------------
    #u sampling(No loops, but individual calculations)
    mu = np.sum(Ztld[:,:,hh] * Xs[hh,:,:], axis = 0)
    u[hh,:] = hbm.rtnorm(mu, sigma, at[hh,:], bt[hh,:])
    return u[hh,:]
    #------------------------------------------------------------

def Kalman_filtering((hh, u, Sita_sys, Ztld)):
    ##Calculation of system model parameters in step2----------------------    
    #Numerical calculation for maximum likelihood estimation of TAU2------------------------------
    ISW = 0; XPS=0;
    #mybounds=[(1e-4,1e2),(1e-4,1e2),(1e-4,1e2),(1e-4,1e2),(1e-4,1e2)]
    #LLF1 = so.fmin_l_bfgs_b(hbm.LogL, x0=tau0,
    #                        args=(np.array(u[hh,:]), F, np.array(Ztld[:,:,hh]), G, R, limy, ISW, zc, m, TIMES, Q0),
    #                        bounds=mybounds, approx_grad=True)         
    #Maximum likelihood estimation of TAU2

    #Kalman filter
    #Q = hbm.Qset(Q0 ,TAU2);
    XF0 = np.zeros(zc)
    VF0 = np.float(10) * np.identity(zc); OSW = 1
    LLF2 = hbm.KF(u[hh,:], XF0, VF0, F, Ztld[:,:,hh], G, Sita_sys[hh,:,:], R, limy, ISW, OSW, zc, TIMES)
    XPS = LLF2['XPS']; XFS = LLF2['XFS']
    VPS = LLF2['VPS']; VFS = LLF2['VFS']
    SIG2 = LLF2['Ovar']; GSIG2 = 1
    #Smoothing----------------------------------------------------------
    LLF3 = hbm.SMO(XPS, XFS, VPS, VFS, F, GSIG2, 1, SEASONALYTY, 1, zc, TIMES)
    Xs[hh,:,:] = np.array(LLF3['XSS']).T #Forcibly convert to match the type
    return Xs[hh,:,:]
    #------------------------------------------------------------

def calculate_difference((hh, Xs)):
    #Calculation of the difference in step3--------------------------------------
    #Initialization of variables used in the calculation of the sum of the difference calculation in step3
    dift = 0.
    difw = 0.
    difbeta = np.array([np.float(0)]*nz)

    dift = sum( (Xs[hh,0,1:TIMES] - Xs[hh,0,0:TIMES-1])**2 )
    difw = sum( (Xs[hh,k,1:TIMES]+sum(Xs[hh, k:(k-1)+SEASONALYTY-1, 0:TIMES-1]))**2 )
    for d in xrange(nz):
        difbeta[d] = sum( (Xs[hh, (k-1)+SEASONALYTY+d, 1:TIMES] 
                                    - Xs[hh, (k-1)+ SEASONALYTY+d, 0:TIMES-1])**2 )      
    # step3--------------------------------------
    Sita_sys[hh,0,0] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+dift)/2, size=1)[0]
    Sita_sys[hh,1,1] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+difw)/2, size=1)[0]
    for d in xrange(nz):
        Sita_sys[hh,2+d,2+d] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+difbeta[d])/2, size=1)[0]
    return Sita_sys[hh,:,:]
    #--------------------------------------------

def calculate_difference_m(Xs): #Compatible with all humans
    #Calculation of the difference in step3--------------------------------------
    #Initialization of variables used in the calculation of the sum of the difference calculation in step3
    dift = np.zeros(nhh, dtype=np.float)
    difw = np.zeros(nhh, dtype=np.float)
    difbeta = np.zeros((nhh,nz), dtype=np.float)

    dift = np.sum( (Xs[:,0,1:TIMES] - Xs[:,0,0:TIMES-1])**2, axis=1 )
    difw = np.sum( (Xs[:,k,1:TIMES]+np.sum(Xs[:, k:(k-1)+SEASONALYTY-1, 0:TIMES-1], axis=1))**2, axis=1 )
    for d in xrange(nz):
        difbeta[:,d] = np.sum( (Xs[:, (k-1)+SEASONALYTY+d, 1:TIMES] 
                                    - Xs[:, (k-1)+ SEASONALYTY+d, 0:TIMES-1])**2, axis=1 )      
    # step3--------------------------------------
    Sita_sys[:,0,0] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+dift)/2, size=nhh)
    Sita_sys[:,1,1] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+difw)/2, size=nhh)
    for d in xrange(nz):
        Sita_sys[:,2+d,2+d] = invgamma.rvs((nu0+TIMES)/2, scale=(s0+difbeta[:,d])/2, size=nhh)
    return Sita_sys[:,:,:]
    #--------------------------------------------

def calculate_spending_habits_param_delta((hh, u, Xs, Sita_dlt, Hdlt, Vsita_dlt, Ztld)):  #Θ of step4_Calculation of δ
    ### step4--------------------------------------
    ## '''Calculation on the dlt side'''
    #Posterior distribution of θ in step4 Initialization of variables used when calculating the sum of the first term of the kernel
    Lsita = 0.
    #Utility value error calculation(Likelihood calculation of θ)
    Lsita = sum( (u[hh,:] - np.diag(np.dot(Ztld[:,:,hh].T, Xs[hh,:,:])))**2  )
    #Secure the current θ
    old_sita_dlt = Sita_dlt[hh]
    #Sampling new θ (sampling)
    new_sita_dlt = Sita_dlt[hh] + ss.norm.rvs(loc=0, scale=sigma_dlt,size=1)[0]
    
    #Likelihood calculation (adjusted by Jacobian for log-likelihood)
    new_Lsita_dlt = Lsita + hbm.Nkernel(new_sita_dlt, Hdlt, D[hh,:], Vsita_dlt)
    new_Lsita_dlt = math.exp(-0.5*new_Lsita_dlt)
    #new_Lsita_dlt = new_Lsita_dlt*(1/math.exp(new_Lsita_dlt))
    old_Lsita_dlt = Lsita + hbm.Nkernel(old_sita_dlt, Hdlt, D[hh,:], Vsita_dlt)
    old_Lsita_dlt = math.exp(-0.5*old_Lsita_dlt)
    #old_Lsita_dlt = old_Lsita_dlt*(1/math.exp(old_Lsita_dlt))
    if old_Lsita_dlt == 0: old_Lsita_dlt=1e-6
        
    #MH step
    alpha = min(1, new_Lsita_dlt/old_Lsita_dlt)
    if alpha==None: alpha = -1
    uni = ss.uniform.rvs(loc=0 , scale=1, size=1)
    if uni < alpha and new_sita_dlt > 0:
        Sita_dlt[hh] = new_sita_dlt
    else:
        rej_dlt[hh] = rej_dlt[hh] + 1
    return Sita_dlt[hh], rej_dlt[hh]

def calculate_spending_habits_param_delta_m((u, Xs, Sita_dlt, Hdlt, Vsita_dlt, Ztld, rej)):  # step4-Multivariate version
    ### step4--------------------------------------
    ## '''Calculation on the dlt side'''
    #Posterior distribution of θ in step4 Initialization of variables used when calculating the sum of the first term of the kernel
    Lsita = np.zeros(nhh,dtype=np.float)
    #Utility value error calculation(Likelihood calculation of θ)
    Lsita = [np.sum( (u[hh,:] - np.sum(Ztld[:,:,hh]*Xs[hh,:,:], axis = 0) )**2 ) for hh in xrange(nhh)]
    #Secure the current θ
    old_sita_dlt = Sita_dlt[:]
    #Sampling new θ (sampling)
    new_sita_dlt = Sita_dlt[:] + ss.norm.rvs(loc=0, scale=sigma_dlt, size=nhh)
   
    #Likelihood calculation (adjusted by Jacobian for log-likelihood)
    new_Lsita_dlt = Lsita + hbm.Nkernel(new_sita_dlt, Hdlt, D[:,:], Vsita_dlt)
    new_Lsita_dlt = np.exp(-0.5*new_Lsita_dlt)
    #new_Lsita_dlt = new_Lsita_dlt*(1/math.exp(new_Lsita_dlt))
    old_Lsita_dlt = Lsita + hbm.Nkernel(old_sita_dlt, Hdlt, D[:,:], Vsita_dlt)
    old_Lsita_dlt = np.exp(-0.5*old_Lsita_dlt)
    #old_Lsita_dlt = old_Lsita_dlt*(1/math.exp(old_Lsita_dlt))
    old_Lsita_dlt[old_Lsita_dlt == 0] = 1e-6

    #MH step
    alpha = np.array(new_Lsita_dlt/old_Lsita_dlt)
    alpha = alpha*(alpha<1) #If it is larger than 1, 0 is entered here.If it is less than 1, the value remains the same.
    alpha[alpha==0] = 1 #1 is greater than 1,Run here
    alpha[alpha!=alpha] = -1  #The part that was NaN-Set to 1
    uni = ss.uniform.rvs(loc=0 , scale=1, size=nhh)
    tmp = alpha - uni
    Sita_tmp = new_sita_dlt*(tmp>0) + old_sita_dlt*(tmp<=0) #MH adoption conditions
    Sita_dlt = Sita_tmp*(Sita_tmp>=0) + old_sita_dlt*(Sita_tmp<0) #The condition that δ is non-negative in this analysis
    tmp = Sita_dlt - old_sita_dlt #  
    rej += 1*(tmp[0,:]==0) #If the calculation result is the same as the initial value, add 1 to the reject.
    #return np.ndarray.flatten(Sita_dlt), np.ndarray.flatten(rej)
    return Sita_dlt[0,:], rej 

def calculate_spending_habits_param_lambd((hh, u, Xs, Sita_lmbd, Hlmbd, Vsita_lmbd, Ztld)):  #Θ of step4_Calculation of λ
    ### step4--------------------------------------
    ## '''Calculation on the lmbd side'''
    #Posterior distribution of θ in step4 Initialization of variables used when calculating the sum of the first term of the kernel
    Lsita = 0.
    #Utility value error calculation(Likelihood calculation of θ)
    Lsita = sum( (u[hh,:] - np.diag(np.dot(Ztld[:,:,hh].T, Xs[hh,:,:])))**2  )
    #Secure the current θ
    old_sita_lmbd = Sita_lmbd[hh]
    #Sampling new θ (sampling)
    new_sita_lmbd = Sita_lmbd[hh] + ss.norm.rvs(loc=0, scale=sigma_lmbd, size=1)[0]
        
    #Likelihood calculation (adjusted by Jacobian for log-likelihood)
    new_Lsita_lmbd = Lsita + hbm.Nkernel(new_sita_lmbd, Hlmbd, D[hh,:], Vsita_lmbd)
    new_Lsita_lmbd = math.exp(-0.5*new_Lsita_lmbd)
    old_Lsita_lmbd = Lsita + hbm.Nkernel(old_sita_lmbd, Hlmbd, D[hh,:], Vsita_lmbd)
    old_Lsita_lmbd = math.exp(-0.5*old_Lsita_lmbd)
    if old_Lsita_lmbd == 0: old_Lsita_lmbd=1e-6
        
    #MH step
    alpha = min(1, new_Lsita_lmbd/old_Lsita_lmbd)
    if alpha==None: alpha = -1
    uni = ss.uniform.rvs(loc=0, scale=1, size=1)
    if uni < alpha:
        Sita_lmbd[hh] = new_sita_lmbd
    else:
        rej_lmbd[hh] = rej_lmbd[hh] + 1
    return Sita_lmbd[hh], rej_lmbd[hh]

def calculate_spending_habits_param_lambd_m((u, Xs, Sita_lmbd, Hlmbd, Vsita_lmbd, Ztld, rej)):  # step4-Multivariate
    ### step4--------------------------------------
    ## '''Calculation on the lmbd side'''
    #Posterior distribution of θ in step4 Initialization of variables used when calculating the sum of the first term of the kernel
    Lsita = np.zeros(nhh,dtype=np.float)
    #Utility value error calculation(Likelihood calculation of θ)
    Lsita = [np.sum( (u[hh,:] - np.sum(Ztld[:,:,hh]*Xs[hh,:,:], axis = 0) )**2 ) for hh in xrange(nhh)]
    #Secure the current θ
    old_sita_lmbd = Sita_lmbd[:]
    #Sampling new θ (sampling)
    new_sita_lmbd = Sita_lmbd[:] + ss.norm.rvs(loc=0, scale=sigma_lmbd, size=nhh)
   
    #Likelihood calculation (adjusted by Jacobian for log-likelihood)
    new_Lsita_lmbd = Lsita + hbm.Nkernel(new_sita_lmbd, Hlmbd, D[:,:], Vsita_lmbd)
    new_Lsita_lmbd = np.exp(-0.5*new_Lsita_lmbd)
    old_Lsita_lmbd = Lsita + hbm.Nkernel(old_sita_lmbd, Hlmbd, D[:,:], Vsita_lmbd)
    old_Lsita_lmbd = np.exp(-0.5*old_Lsita_lmbd)
    old_Lsita_lmbd[old_Lsita_lmbd == 0] = 1e-6

    #MH step
    alpha = np.array(new_Lsita_lmbd/old_Lsita_lmbd)
    alpha = alpha*(alpha<1) #If it is larger than 1, 0 is entered here.If it is less than 1, the value remains the same.
    alpha[alpha==0] = 1 #1 is greater than 1,Run here
    alpha[alpha!=alpha] = -1  #The part that was NaN-Set to 1
    uni = ss.uniform.rvs(loc=0 , scale=1, size=nhh)
    tmp = alpha - uni
    Sita_lmbd = new_sita_lmbd*(tmp>0) + old_sita_lmbd*(tmp<=0) #MH adoption conditions
    rej += 1*(tmp[0,:]<=0) #If the calculation result is the same as the initial value, add 1 to the reject.
    return Sita_lmbd[0,:], rej  

## step7-----------------------------------------------------------
def calc_INV((hh, dlt, lmbd)):
    INV = np.zeros(TIMES,dtype = np.double)
    for t in xrange(1,TIMES):
        if abs(INV[t-1]) < 1e-6 : Cons = 0.0
        else:
            denom = dlt[hh]*C[hh] + (INV[t-1])**lmbd[hh]
            if denom == 0.0: denom = 1e-6
            Cons = INV[t-1]*dlt[hh]*C[hh]/denom
        INV[t] = INV[t-1] + PM[hh, t-1]- Cons
    return INV

def calc_INV_m((dlt, lmbd)):
    INV = np.zeros((TIMES,nhh), dtype = np.double)
    Cons = np.zeros(nhh, dtype = np.double)
    for t in xrange(1,TIMES):
        tmp = INV[t-1,:]*(abs(INV[t-1,:])>=1e-6) + 1e-6*(abs(INV[t-1,:])<1e-6) #INV is 1e-If it is 6 or more, the INV value is adopted, and if it is less than 6, the molecule becomes too small (warnig appears), so 1e-6
        denom = np.array(dlt[:]*C[:] + (tmp)**lmbd[:])
        denom[denom == 0.0] = 1e-6
        Cons = (INV[t-1,:]*dlt*C/denom)*(abs(INV[t-1,:])>=1e-6) # INV[t-1]Is 1e-0 when 6 or less
        INV[t] = INV[t-1,:] + PM[:,t-1]- Cons
    return INV
## ----------------------------------------------------------------

def pool_map(func, itr, args):
    pool = Pool(processes=pr)
    result = pool.map( func, ((i, args) for i in range(itr)) )
    pool.close()
    pool.join()
    return result
#-------------------------------------------- 
print "done ini"   
##Sampling loop
if __name__ == "__main__":
    rej_dlt = [0]*nhh
    rej_lmbd = [0]*nhh

    #start = time.clock()
    for nd in xrange(RP):
        # step1--calculate utility
        pool = Pool(processes=pr)
        u = np.array( pool.map(calculate_utility, ((hh, Xs, Ztld) for hh in xrange(nhh))) )
        pool.close()
        pool.join()     
        udraw[:,:,nd] = np.matrix(u[:,10]).T
        # step2--calculate implicit system v variable=Xa(do kalman filter)
        pool = Pool(processes=pr)
        Xs = np.array( pool.map(Kalman_filtering, ((hh, u, Sita_sys, Ztld) for hh in xrange(nhh))) ) 
        pool.close()
        pool.join()
        # step3--calculate difference
        Sita_sys = calculate_difference_m(Xs)
        # step4--calculate_spending_habits_param_delta=Sita_dlt
        Sita_dlt, rej_dlt = calculate_spending_habits_param_delta_m((u, Xs, Sita_dlt, Hdlt, Vsita_dlt, Ztld, rej_dlt))                
        # step4--calculate_spending_habits_param_lambd=Sita_lmbd
        Sita_lmbd, rej_lmbd = calculate_spending_habits_param_lambd_m((u, Xs,  Sita_lmbd, Hlmbd, Vsita_lmbd, Ztld, rej_lmbd))
        ### step5--------------------------------------
        ##Calculation on the dlt side----
        #Calculation of parameters for multivariate normal distribution
        D2 = np.dot(D.T, D)
        D2pA0 = D2 + A0
        Hhat_dlt = np.ndarray.flatten( np.dot(np.dot(inverse(D2), D.T) , Sita_dlt.T) )
        Dtld = np.dot( inverse(D2pA0) , (np.dot(D2, Hhat_dlt) + np.dot(A0, np.ndarray.flatten(m0))) )
        rtld = np.ndarray.flatten(Dtld)
        sig =  np.array( np.kron(Vsita_dlt, inverse(D2pA0)).T )
        #Sampling with multivariate normal distribution
        Hdlt = np.ndarray.flatten( hbm.randn_multivariate(rtld, np.matrix(sig), n=nvar) )
        ##-----------------
        ##Calculation on the lmbd side----
        #Calculation of parameters for multivariate normal distribution
        Hhat_lmbd = np.ndarray.flatten( np.dot( np.dot(inverse(D2), D.T) , Sita_lmbd.T) )
        Dtld = np.dot( inverse(D2pA0) , (np.dot(D2, Hhat_lmbd) + np.dot(A0, np.ndarray.flatten(m0))) )
        rtld = np.ndarray.flatten(Dtld)
        sig =  np.array( np.kron(Vsita_lmbd, inverse(D2pA0)).T )
        #Sampling with multivariate normal distribution
        Hlmbd = np.ndarray.flatten( hbm.randn_multivariate(rtld, np.matrix(sig), n=nvar) ) 
        ##-----------------
        #--------------------------------------------
    
        ### step6--------------------------------------
        ##Calculation on the dlt side
        #Calculation of parameters for the inverse Wishart distribution
        #div = np.array(Sita_dlt) - np.dot( D, np.ndarray.flatten(Hdlt).T ).T
        div = np.array(Sita_dlt) - np.dot( D, Hdlt.T ).T
        S = np.dot(div, div.T) #In the above calculation, the div becomes a horizontal vector, so T is behind to make S a scalar.
        #Sampling with inverse Wishart distribution
        Vsita_dlt = hbm.invwishartrand(f0 + nhh, V0 + S)  
        ##------------
        ##Calculation on the lmbd side
        #Calculation of parameters for the inverse Wishart distribution
        div = np.array(Sita_lmbd) - np.dot( D, Hlmbd.T ).T
        S = np.dot(div, div.T)      
        #Sampling with inverse Wishart distribution
        Vsita_lmbd = hbm.invwishartrand(f0 + nhh, V0 + S)  
        ##------------  
        #--------------------------------------------

        ## step7-------------------------------------------
        ##Updated Z spending amount
        INV = calc_INV_m((Sita_dlt, Sita_lmbd))
        Ztld[(k-1)+SEASONALYTY+nz-1,:,:] = hbm.standardization(INV)       
        ##-------------------------------------------
        print "nd=%d", nd
    #stop = time.clock()
    #print stop - start
    ##-------------------------------------------
#    cProfile.run("worker(int(sys.argv[1]), int(sys.argv[2]))")
    ## output area---------------------------------------------------------
    outf = open('test_udraw.csv','w')
    nums = []
    for hh in xrange(nhh):
        prsd = np.std(udraw[hh,0, 0:math.ceil(0.1*RP)]); posd = np.std(udraw[hh,0,RP-math.floor(0.5*RP):])
        prav = np.average(udraw[hh,0,0:math.ceil(0.1*RP)]); poav = np.average(udraw[hh,0,RP-math.floor(0.5*RP):])
        prf = np.sum(abs(udraw[hh,0,0:math.ceil(0.1*RP)])); pof = np.sum(abs(udraw[hh,0,RP-math.floor(0.5*RP):]))
        Z = math.sqrt(RP)*(prav - poav)/math.sqrt( prsd**2/0.1 + posd**2/0.5 )
        outf.write("UsrNon="+ str(hh) + ",Z="+ str(Z) +"\n")
        for t in xrange(1):
            nums = [str(udraw[hh,t,p/keep]) for p in xrange(RP) if p%keep == 0 ]
            outf.write((",".join(nums))+"\n") 
    outf.close()
    ## --------------------------------------------------------------------
    ## output area---------------------------------------------------------
    outf = open('test_crm.csv','w')
    nums = []
    for hh in xrange(nhh):
        outf.write("UsrNon="+ str(hh) +"\n")
        for r in xrange(zc):
            nums = [str(Xs[hh,r,t]) for t in xrange(TIMES)]
            outf.write((",".join(nums))+"\n")
    outf.close()
    ## --------------------------------------------------------------------
    ## output area---------------------------------------------------------
    outf01 = open('test_u.csv','w')
    outf02 = open('test_INV.csv','w')
    outf03 = open('test_rej_dlt.csv','w')
    outf04 = open('test_rej_lmbd.csv','w')
    outf05 = open('test_sita_dlt.csv','w')
    outf06 = open('test_sita_lmbd.csv','w')
    nums01 = []; nums02 = []; nums03 = []; nums04 = []; nums05 = []; nums06 = []
    for hh in xrange(nhh):
        #Store in order
        nums01 = [str(u[hh,t]) for t in xrange(TIMES)]
        nums02 = [str(INV[t,hh]) for t in xrange(TIMES)]
        nums03 = [str(rej_dlt[hh])]
        nums04 = [str(rej_lmbd[hh])]
        nums05 = [str(Sita_dlt[hh])]
        nums06 = [str(Sita_lmbd[hh])]

        #writing
        outf01.write((",".join(nums01))+"\n")
        outf02.write((",".join(nums02))+"\n")
        outf03.write((",".join(nums03))+"\n")
        outf04.write((",".join(nums04))+"\n")
        outf05.write((",".join(nums05))+"\n")
        outf06.write((",".join(nums06))+"\n")
    outf01.close()
    outf02.close()
    outf03.close()
    outf04.close()
    outf05.close()
    outf06.close()
    ## --------------------------------------------------------------------

    print("done all")

Next is the function file for the branches and leaves.

one_to_one_mrk.py


# coding: utf8

import numpy.random as npr
from numpy.oldnumeric.linear_algebra import inverse
import scipy
from scipy import linalg
import random
import scipy.stats.distributions as dis
from scipy.stats import uniform,chi2,invgamma
import scipy.stats as ss
import numpy as np
import math
from scipy.linalg import inv,cholesky,det

###Function definition------------------------------------
#Bayesian inference of binary probit model(For step1)
def rtnorm(mu, sigma, a, b):
    lngth = len(mu)
    FA = dis.norm.cdf(a, loc=mu, scale=sigma, size = lngth)
    FB = dis.norm.cdf(b, loc=mu, scale=sigma, size = lngth)
    result = np.array([
                      dis.norm.ppf( 
                                   np.dot(ss.uniform.rvs(loc=0, scale=1,size=len(np.matrix(mu[t]))),
                                  (FB[t] - FA[t]) + FA[t]),
                                   loc=mu[t], scale=sigma, size=len(np.matrix(mu[t])) ) #percent point =Q value
                        for t in xrange(lngth)
                        ]
                      )
    result[result == -np.inf] = -100
    result[result == np.inf] = 100
    return result[:,0]

#A function that standardizes the value of Z(For step1)
def standardization(z):
    return np.log( 0.001 + (z - np.min(z))/(np.max(z) - np.min(z))*(0.999 - 0.001) )

#Calculation of the multiplier part of the kernel part of the univariate normal distribution(For step4)
def Nkernel(sita, H, D, Vsita):
    if Vsita == 0: Vsita=1e-6
    return ((sita - np.dot(H,D.T))**2)/Vsita

#Calculation of the multiplier part of the kernel part of the multivariate normal distribution(For step4)
def NMkernel(sita, H, D, Vsita):
    res = sita - np.dot(H.T, D)
    return np.dot( np.dot(res.T, inverse(Vsita)), res )

###Matrix setting of state-space representation of seasonally adjusted model
def FGHset(al, k, p, q, nz):  #al is the α vector of the AR model, k;p;q is trend, season, AR
    m = k + p + q + nz -1
    if(q>0): G = np.zeros((m,3+nz)) #When the state model includes trend, season, and AR
    else: G = np.zeros((m,2+nz))    #When it does not contain AR component(q=0)
    F = np.zeros((m,m))
    #H = matrix(0,1,m)          #Ztld is used instead of H, so H is unnecessary
  
    ##Construction of block matrix of trend model
    G[0,0] = 1
    if k==1: F[0,0] = 1
    if k==2: F[0,0] = 2; F[0,1] = -1; F[1,0] = 1
    if k==3: F[0,0] = 3; F[0,1] = -3; F[0,2] = 1; F[1,0] = 1; F[2,1] = 1
    LS = k
    NS = 2
  
    ##Construction of block matrix of seasonally adjusted components
    G[LS, NS-1] = 1
    for i in xrange(p-1): F[LS, LS+i] = -1
    for i in xrange(p-2): F[LS+i+1, LS+i] = 1

    ##Construction of block matrix of Z component
    LS = LS + p -1
    NS = 2
    for i in xrange(nz): F[LS+i, LS+i] = 1
    for i in xrange(nz): G[LS+i, NS+i] = 1
  
    if q>0:
        NS = NS +1
        G[LS, NS-1] = 1
        for i in xrange(q): F[LS, LS+i-1] = al[i]
        if q>1:
            for i in xrange(q-1): F[LS+i, LS+i-1] = 1
  
    ##Calculation of the frame of the variance-covariance matrix Q of the system model
    Q = np.identity(NS+nz)
  
    return {'m':m, 'MatF':F, 'MatG':G, 'MatQ':Q}

#Setting the matrix Q in the state space representation------------------------
def Qset(Q0,parm):
    NS = len(Q0)
    Q = Q0
    #Calculation of the frame of the variance-covariance matrix Q of the system model
    for i in xrange(NS): Q[i,i] = parm[i]
    return np.array(Q)

#Kalman filter function------------------------------------------
def KF(y, XF0, VF0, F, H, G, Q, R, limy, ISW, OSW, m, N):
    if OSW == 1:
        XPS = np.zeros((N,m),dtype=np.float); XFS = np.zeros((N,m),dtype=np.float)
        VPS = np.zeros((N,m,m),dtype=np.float); VFS = np.zeros((N,m,m),dtype=np.float)
    XF = XF0; VF = VF0; NSUM = 0.0; SIG2 = 0.0; LDET = 0.0    
    for  n in xrange(N):
        #Forecast for the first term
        XP = np.ndarray.flatten( np.dot(F, XF.T) ) #Since it becomes a vertical vector from the second week, it is always converted to a horizontal vector
        VP = np.dot( np.dot(F, VF), F.T ) +  np.dot( np.dot(G, Q), G.T)
        #filter
        #R is a vertical vector if not manipulated. Note that python is a horizontal vector!
        if y[n] < limy: 
            NSUM = NSUM + 1
            B = np.dot( np.dot(H[:,n], VP), H[:,n].T)  + R  #H is mathematically a horizontal vector
            B1 = inverse(B) #nvar dimensional vertical vector
            K = np.matrix(np.dot(VP, H[:,n].T)).T * np.matrix(B1) #K becomes a vertical vector(matrix)           
            e = np.array(y[n]).T - np.dot(H[:,n], XP.T) #nvar dimensional vertical vector
            XF = np.array(XP) + np.array( K * np.matrix(e) ).T #Horizontal vector
            VF = np.array(VP) - np.array( K* np.matrix(H[:,n]) * VP)           
            SIG2 = SIG2 + np.ndarray.flatten(np.array( np.matrix(e) * np.matrix(B1) * np.matrix(e).T ))[0] #Make matrix so that it can be calculated even in one dimension
            LDET = LDET + math.log(linalg.det(B))
        else:
            XF = XP; VF = VP
        if OSW == 1:
            XPS[n,:] = XP; XFS[n,:] = XF; VPS[n,:,:] = VP; VFS[n,:,:] = VF
    SIG2 = SIG2 / NSUM
    if ISW == 0:
        FF = -0.5 * (NSUM * (math.log(2 * np.pi * SIG2) + 1) + LDET)
    else:
        FF = -0.5 * (NSUM * (math.log(2 * np.pi) + SIG2) + LDET)
    if OSW == 0:
        return {'LLF':FF, 'Ovar':SIG2}
    if OSW == 1:
        return {'XPS':XPS, 'XFS':XFS, 'VPS':VPS, 'VFS':VFS, 'LLF':FF, 'Ovar':SIG2}

#Smoothing function----------------------------------------------------
def SMO(XPS, XFS, VPS, VFS, F, GSIG2, k, p, q, m, N):
    XSS =  np.zeros((N,m),dtype=np.float); VSS =  np.zeros((N,m,m),dtype=np.float)
    XS1 = XFS[N-1,:]; VS1 = VFS[N-1,:,:]
    XSS[N-1,:] = XS1; VSS[N-1,:,:] = VS1
    for n1 in xrange(N-1):        
        n = (N-1) - n1; XP = XPS[n,:]; XF = XFS[n-1,:]
        VP = VPS[n,:,:]; VF = VFS[n-1,:,:]; VPI = inverse(VP)
        A = np.dot( np.dot(VF, F.T), VPI)
        XS2 = XF + np.dot(A, (XS1 - XP))
        VS2 = VF + np.dot( np.dot(A, (VS1 - VP)), A.T )
        XS1 = XS2; VS1 = VS2
        XSS[n-1,:] = XS1; VSS[n-1,:,:] = VS1
    return {'XSS':XSS, 'VSS':VSS}

#Definition of TAU2x log-likelihood function----------------------------------------
def LogL(parm, *args):
    y=args[0]; F=args[1]; H=args[2]; G=args[3]; R=args[4]; limy=args[5]
    ISW=args[6]; k=args[7]; m=args[8]; N=args[9]; Q0=args[10]
    Q = Qset(Q0 ,parm)
    XF0 = np.array([0]*k); VF0 = np.array(10 * np.identity(k)); OSW = 0
    LLF = KF(y, XF0, VF0, F, H, G, Q, R, limy, ISW, OSW, k, N)
    LL = LLF['LLF']
    return -LL #Since optimeze is a minimization function, it returns the log-likelihood multiplied by minus.

#Definition of the generation function of the multivariate normal distribution--------------------------------------
def randn_multivariate(mu,Sigma,n=1):
    X = np.random.randn(n,len(mu))    
    A = linalg.cholesky(Sigma)    
    Y = np.dot(np.array(X),np.array(A)) + mu
    return Y

#Definition of inverse Wishart function----------------------------------------------
def invwishartrand_prec(nu,phi):
    return inv(wishartrand(nu,phi))

def invwishartrand(nu, phi):
    return inv(wishartrand(nu, inv(phi)))
 
def wishartrand(nu, phi):
    dim = phi.shape[0]
    chol = cholesky(phi)
    foo = np.zeros((dim,dim))
    
    for i in xrange(dim):
        for j in xrange(i+1):
            if i == j:
                foo[i,j] = np.sqrt(chi2.rvs(nu-(i+1)+1))
            else:
                foo[i,j]  = npr.normal(0,1)
    return np.dot(chol, np.dot(foo, np.dot(foo.T, chol.T)))
# -------------------------------------------------------

#Calculation of pseudo-household inventory amount----------------------------------------------
def calc_INV(TIMES, PM, C, nhh):
    ##Customer consumption trend Data
    INV = np.zeros((TIMES,nhh),dtype = np.double)
    dlt = np.array([np.float(1.0)]*nhh)
    lmbd = np.array([np.float(1.0)]*nhh)
    for t in xrange(1,TIMES):
        denom = np.array(dlt[:]*C[:] + (INV[t-1,:])**lmbd[:])
        denom[denom==0.0] = 1e-6
        Cons = INV[t-1,:]*dlt[:]*C[:]/denom
        INV[t,:] = INV[t-1,:] + PM[:,t-1] - Cons
    return np.array(INV)
# -------------------------------------------------------

For simplicity, we have reduced the number of explanatory variables when calculating royalties. Please note that the contents of the books are different in this respect. In addition, although there was no specific description in the book, the convergence test of MCMC is also included. The judgment is based on Geweke's diagnosis. For an overview of MCMC, refer to iAnalysis-Dad's Analysis Diary-About MCMC (Markov Chain Monte Carlo). I would like to have it.

If you want the data to be used in the test, you can get it from User support page of the book. I think it's easy and good.

Summary

I refrained from posting the output because it was annoying </ del>. It's negligence. I'm sorry. .. ..

If you are interested, please use it in practice. (I think my implementation is also pretty bad, but I think it's better to write it in the compiler language because python is slow and practically difficult ^^;)

Recommended Posts

Calculation of time series customer loyalty
Differentiation of time series data (discrete)
Time series analysis 3 Preprocessing of time series data
[Python] Accelerates loading of time series CSV
Calculation of forward average (Piyolog sleep time)
Time series analysis 4 Construction of SARIMA model
measurement of time
Time Series Decomposition
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
Python: Time Series Analysis
Anomaly detection of time series data by LSTM (Keras)
Python time series question
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Measurement of execution time
Display TOPIX time series
Time series plot / Matplotlib
Reformat the timeline of the pandas time series plot with matplotlib
A story about clustering time series data of foreign exchange
Time series analysis related memo
Calculation time measurement using maf
Calculation of similarity by MinHash
Time calculation operator datetime timedelta
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
About cost calculation of MeCab
[Python] Plot time series data
Time series analysis Part 1 Autocorrelation
How to extract features of time series data with PySpark Basics
I compared the calculation time of the moving average written in Python
Comparison of time series data predictions between SARIMA and Prophet models