This article is the 18th day of the Wacul Advent Calendar.

Self-introduction

I have been working in the analysis team of WACUL Co., Ltd. for a year. Python history: about 3 weeks

things to do

Practicing python & I will try to scratch-implement a deep Boltzmann machine mainly for the purpose of studying probabilistic deep learning. All the theoretical aspects are in the following books.

" Deep Learning (Machine Learning Professional Series) "Takayuki Okatani (Author) " Deep Learning "(supervised by the Japanese Society for Artificial Intelligence)

In this article, we will first implement a restricted Boltzmann machine as a preparation. Next time, we will build a deep Boltzmann machine by stacking restricted Boltzmann machines.

A brief description of the Boltzmann machine

Since the probability distribution is simply a "function that associates a set of values with the probability density (probability mass) at which it is obtained", it can be regarded as a "mechanism behind the generation of a set of values" as a whole. I can do it. This is called *** Generative Model *** </ font>. If you can build a generative model, you can understand the relationships between variables and calculate expected values, which will expand the range of applications. The Boltzmann machine that we will discuss this time is a generative model that calculates the probability density using the Boltzmann distribution. The specific probability density is the standardized value after taking the exponential function of the energy function (Φ) defined below multiplied by minus. (Multiply a constant to fit in the closed interval of [0,1]) The energy function is a weighted linear sum of each node bias and link.

スライド1.png

Visible and hidden variables

Visible variables are variables that can be observed directly, and hidden variables are variables that cannot be observed. Introducing hidden variables improves the expressiveness of the Boltzmann machine. In the case of a model that includes hidden variables, distribution of only visible variables marginalized with respect to hidden variables </ strong> is used as the objective function for maximum likelihood estimation.

Limited Boltzmann machine

The restricted Boltzmann machine is a model that simplifies parameter estimation by introducing hidden variables into a general Boltzmann machine and then applying certain constraints. What are the restrictions? ① There is no link between visible nodes ② There is no link between hidden nodes There are two. From these two conditions, the restricted Boltzmann machine has two layers, a "visible layer" consisting of only visible nodes and a "hidden layer" consisting of only hidden nodes, and is expressed as a model with links only between different layers. It will be.

This means that *** </ font> means that one layer is conditionally independent of one layer *** </ font>. -Get a sample group of all nodes in the hidden layer from the current expected values of all nodes in the visible layer -Approximate the expected value of all nodes in the hidden layer from the sample group of all nodes in the hidden layer ・ Obtain a sample group of all nodes in the visible layer from the expected values of all nodes in the hidden layer. -Approximate the expected value of all nodes in the visible layer from the sample group of all nodes in the visible layer You can turn the process. This calculation has the advantage that it is relatively easy and you can get a sample of each layer at once. Limitations The limits of the Boltzmann machine can be said to benefit from this in exchange for the expressiveness of the model.

Learning a restricted Boltzmann machine

Once you've decided on the shape of your model, the next thing to do is learn the parameters. Parameter learning of the restriction Boltzmann machine uses the orthodox gradient (ascending) method for the purpose of maximizing the log-likelihood function. The gradient increase method is a method that aims to maximize the function while repeating the process of updating the parameters little by little in the direction in which the value of the objective function increases. Distribution). To maximize this, the objective function is differentiated by the target parameters to find the gradient. Expected values for visible and hidden layers are required when calculating the gradient, but since this is difficult to calculate directly, it is approximated by actually generating a sample using Gibbs sampling and averaging it. At this time, the calculation can be easily performed by the benefit of the "conditional independence between layers" of the above-mentioned restricted Boltzmann machine.

Learning goes through the following process when viewed broadly.

⚫️ Randomly initialize parameters ⚫️ Repeat below---------------------------------------------- ① Alternately sampling the visible layer and the hidden layer using the current parameter values ② Calculate the expected value of each node in the visible layer and hidden layer using ① ③ Use ② to calculate the bias of the visible layer, the bias of the hidden layer, and the gradient (derivative) of each link. 　　　　 ④ Update parameters using ③ ---------------------------------------------------------

Python implementation of restricted Boltzmann machine

The persistent contrastive divergence method is used when sampling. Since the processing efficiency is not considered, the convergence test is omitted.

RBM class implementation

from typing import List Input = List[int] def sigmoid(z): return 1/(1+np.exp(-z)) class RBM: learningRate = 0.005 sample_num = 100 #Vn is the number of visible nodes, Hn is the number of hidden nodes def __init__(self , Vn : int , Hn : int): self.Vn = Vn self.Hn = Hn self.bias_v = np.array(np.random.rand(Vn)) #Randomly initialized self.bias_h = np.array(np.random.rand(Hn)) #Randomly initialized self.link = np.array(np.random.rand(Vn,Hn)) #Randomly initialized self.hidden_value = [] def train(self, data_for_learning , iteration_times): for iteration_times in range(iteration_times): # samples_vh is v*h sample samples_h =[] samples_v = [] samples_vh = np.zeros((self.Vn,self.Hn)) messege_in_sampling = 0 sigmoid_belief = 0 #① Alternately sampling the visible layer and the hidden layer using the current parameter values for index in range(RBM.sample_num): current_v = data_for_learning[0] current_h = np.zeros(self.Hn) ##Sampling hidden layers for j in range(self.Hn) : messege_in_sampling = self.bias_h[j] + sum(list(map(lambda i: current_v[i] * self.link[i][j] , range(self.Vn)))) sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling)) r = np.random.rand(1) if sigmoid_belief > r[0] : current_h[j] = 1 else : current_h[j] = 0 samples_h.append(current_h) ##Sampling the visible layer for i in range(self.Vn): messege_in_sampling = self.bias_v[i] + sum(list(map(lambda j: current_h[j] * self.link[i][j] , range(self.Hn)))) sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling)) r = np.random.rand(1) if sigmoid_belief > r[0] : current_v[i] = 1 else : current_v[i] = 0 samples_v.append(current_v) ##Visible layer ✖️ Hidden layer sampling z = itertools.product(current_v,current_h) product = [] for element in z: product.append(element) current_vh = (np.array(list(map(lambda x : x[0] * x[1] ,product))) ) .reshape(self.Vn,self.Hn) samples_vh += current_vh #② Calculate the expected value of each node in the visible layer and hidden layer E_V = np.sum(np.array(samples_v),axis=0) / RBM.sample_num E_H = np.sum(np.array(samples_h),axis=0) / RBM.sample_num E_VH = samples_vh / RBM.sample_num #③ Calculate the bias of the visible layer, the bias of the hidden layer, and the gradient (derivative) of each link. ##Derivative of visible layer bias gradient_v = sum(np.array(data_for_learning)) / len(data_for_learning) - E_V ##Derivative of hidden layer bias gradient_h = [] for j in range(len(self.bias_h)): gradient_h_1 = [] sigmoid_beliefs_h_1 = [] for n in range(len(data_for_learning)): messege_h_1 = self.bias_h[j] + sum(list(map(lambda i: data_for_learning[n][i] * self.link[i][j] , range(self.Vn)))) sigmoid_belief_h_1 = 1/(1+ np.exp(-1 * messege_h_1)) sigmoid_beliefs_h_1.append(sigmoid_belief_h_1) gradient_h_1 = sum(sigmoid_beliefs_h_1) / len(data_for_learning) gradient_h_2 = E_H[j] gradient_h_j = gradient_h_1 + gradient_h_2 gradient_h.append(gradient_h_j) ##Derivative of the link gradient_vh = [] for i in range(len(self.bias_v)): for j in range(len(self.bias_h)): gradient_vh_1 = [] sigmoid_beliefs_vh_1 = [] for n in range(len(data_for_learning)): messege_vh = self.bias_h[j] + sum(list(map(lambda i: data_for_learning[n][i] * self.link[i][j] , range(self.Vn)))) sigmoid_belief_vh_1 = 1/(1+ np.exp(-1 * messege_vh)) sigmoid_beliefs_vh_1.append(sigmoid_belief_vh_1 * data_for_learning[n][i]) gradient_vh_1 = sum(sigmoid_beliefs_vh_1) / len(data_for_learning) gradient_vh_2 = E_VH[i][j] gradient_vh_ij = gradient_vh_1 + gradient_vh_2 gradient_vh.append(gradient_vh_ij) #④ Update parameters using ③ self.bias_v += RBM.learningRate * np.array(gradient_v) self.bias_h += RBM.learningRate * np.array(gradient_h) self.link += RBM.learningRate * np.array(gradient_vh).reshape(self.Vn,self.Hn) def energy(self,input : Input) -> float: #Visible layer energy energy_v = sum(list(map(lambda i : self.bias_v[i] * input[i] , range(self.Vn)))) #Hidden layer energy ##Estimated value of hidden layer hidden_layer = self.getHiddenValue(input) energy_h = sum(list(map(lambda j : self.bias_h[j] * hidden_layer[j] , range(self.Hn)))) #link energy accumlator = [] for i in range(self.Vn): for j in range(self.Hn): accumlator.append(input[i]*hidden_layer[j] * self.link[i][j]) energy_vh = sum(accumlator) energy = -1 * (energy_v + energy_h + energy_vh) return energy def estimate(self,input:Input) -> float: #Calculate energy function energy = self.energy(input) #Calculate partition function all_patterns = list(itertools.product(range(0,2), repeat=self.Vn)) partition = sum(list(map(lambda x: np.exp(-1 *self.energy(list(x))) , all_patterns))) return np.exp(-1* energy ) / partition def getHiddenValue(self,input:Input) : hidden_layer = np.zeros(self.Hn) for j in range(self.Hn) : messege_in_sampling = self.bias_h[j] + sum(list(map(lambda i: input[i] * self.link[i][j] , range(self.Vn)))) sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling)) r = np.random.rand(1) if sigmoid_belief > r[0]: hidden_layer[j] = 1 else : hidden_layer[j] = 0 self.hidden_value = hidden_layer return hidden_layer

Execution example

#Build visible_nord = 3 hidden_nord = 4 rbm = RBM(visible_nord,hidden_nord) #Learning data creation training_data = np.array(list(map(lambda x : np.random.randint(0,2) , range(0,300)))).reshape(100,3) #Parameter learning rbm.train(training_data,10) print(rbm.link) input = [1,0,1] #Get the value of the hidden layer rbm.getHiddenValue(input) #Energy function output rbm.energy(input) #Probability density output rbm.estimate()

It seems to work, but it's unclear if it fits. .. .. I will move on for the time being.

next time

Next time, we will combine this limited Boltzmann machine to configure a deep Boltzmann machine. The deeper the layer, the more difficult it is to learn, so the story is that you can get a good initial value of the parameters by pre-learning using the restricted Boltzmann machine first, and then use this to fine-tune the entire model. Will be.

Recommended Posts
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ②

[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ①

Qiskit: Implementation of quantum Boltzmann machine

Explanation and implementation of simple perceptron

Implementation of Dijkstra's algorithm with python

Machine learning with python (2) Simple regression analysis

Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function

[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~

Implementation of TRIE tree with Python and LOUDS

[Codeing interview] Implementation of enigma cryptographic machine (Python)

Explanation of edit distance and implementation in Python

Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network

Implementation example of simple LISP processing system (Python version)

Calculate the regression coefficient of simple regression analysis with python

Summary of the basic flow of machine learning with Python

[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method

A simple Python implementation of the k-nearest neighbor method (k-NN)

[For beginners] Summary of standard input in Python (with explanation)

Explanation and implementation of SocialFoceModel

Machine learning with Python! Preparation

Simple FPS measurement of python

Python implementation of particle filters

Beginning with Python machine learning

Implementation of quicksort in Python

Deep reinforcement learning 2 Implementation of reinforcement learning

Implementation of CRUD using REST API with Python + Django Rest framework + igGrid

Crawling with Python and Twitter API 2-Implementation of user search function

Implementation of Bulk Update with mongo-go-driver

Explanation and implementation of PRML Chapter 4

Explanation and implementation of ESIM algorithm

Python implementation of self-organizing particle filters

Machine learning with python (1) Overall classification

Implementation of life game in Python

Getting Started with Python Basics of Python

10 functions of "language with battery" python

Implementation of desktop notifications using Python

Python implementation of non-recursive Segment Tree

Implementation of Light CNN (Python Keras)

Implementation of original sorting in Python

Automate simple tasks with Python Part0

Coexistence of Python2 and 3 with CircleCI (1.0)

"Scraping & machine learning with Python" Learning memo

Basic study of OpenCV with Python

[AtCoder explanation] Control the A, B, C problems of ABC182 with Python!

REST API of model made with Python with Watson Machine Learning (CP4D edition)

Solve a simple traveling salesman problem using a Boltzmann machine with simulated annealing

[AtCoder explanation] Control the A, B, C problems of ABC186 with Python!

[AtCoder explanation] Control the A, B, C problems of ABC185 with Python!

Source code of sound source separation (machine learning practice series) learned with Python

A beginner of machine learning tried to predict Arima Kinen with python

[AtCoder explanation] Control the A, B, C problems of ABC187 with Python!

Explanation of creating an application for displaying images and drawing with Python

[AtCoder explanation] Control the A, B, C problems of ABC184 with Python!

Implement a simple application with Python full scratch without using a web framework.

Explanation of production optimization model by Python

Basics of binarized image processing with Python

[Examples of improving Python] Learning Python with Codecademy

Creating a simple PowerPoint file with Python

Amplify images for machine learning with python

Simple Slack API client made with Python

Execute Python script with cron of TS-220