RBM is a method used as pre-learning when multiple layers are used in deep learning. The source code is published on theano. http://deeplearning.net/tutorial/rbm.html This time I ported this code to chainer.

In the case of RBM, the error function is a function of Free Energy and is complicated (?). Because of the inverse error propagation, if you differentiate with the weight matrix and bias variables, you will get a beautiful calculation formula. It can be calculated from the value of Contrastive Divergence (CD) processing. When implementing with chainer, rather than bother to create an error function (loss) and perform inverse error propagation (loss.backward) It would be simpler to simply implement it with cupy (numpy + GPU), but I dared to implement it using the error function. The source code of cupy (numpy + GPU) only will be released when the opportunity arises. However, the following source code also uses cupy to speed up CD processing. .. ..

```
# -*- coding: utf-8 -*-
import os,sys
import argparse
import time
import numpy as np
import chainer
from chainer import computational_graph
from chainer import cuda, Variable
import chainer.functions as F
import chainer.links as L
import chainer.optimizers as O
import pickle
import gzip
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', '-g', default=-1, type=int,
help='GPU ID (negative value indicates CPU)')
parser.add_argument('--pcd', '-p', default=1, type=int,
help='pcd_flag')
parser.add_argument('--kcd', '-k', default=1, type=int,
help='cd-k')
'''When calculating with GPU, cupy= numpy +Process with GPU.'''
args = parser.parse_args()
if args.gpu >= 0:
cuda.check_cuda_available()
xp = cuda.cupy if args.gpu >= 0 else np
def sigmoid(x):
return 1. / (1 + xp.exp(-x))
class RBM(chainer.Chain):
'''
n_visbile:Dimension of visible layer
n_hidden:Dimension of hidden layer
l.W:Weight matrix
l.b:hbias
l.a:vbaias
real:Visible layer is 0,Cases are divided according to whether it is 1 or a real number. The initial layer assumes a real number, but the middle layer is 0,I'm thinking only one.
h in chainer= l(v)The weight matrix that appears in general RBM is transposed so as to multiply.
'''
def __init__(self,n_visible,n_hidden,real=0,k=1,pcd_flag=0):
super(RBM,self).__init__(
l=L.Linear(n_visible,n_hidden),
# chainer 1.Since this usage was deprecated in 5 manual, add_Use param.
# a=L.Parameter(xp.zeros(n_visible,dtype=xp.float32)),
)
self.l.add_param("a",(n_visible),dtype=xp.float32)
''' l.Doesn't it learn unless a is initialized and initialized by using chainer?'''
self.l.a.data.fill(0)
self.n_visible = n_visible
self.n_hidden = n_hidden
self.real = real
self.k = k
self.pcd_flag = pcd_flag
def __call__(self,v_data,v_prev_data=None):
batch_size = v_data.shape[0]
if self.pcd_flag == 0:
v_prev_data = v_data
elif self.pcd_flag ==1 and v_prev_data is None:
v_prev_data = v_data
vh_data = self.constrastive_divergence(v_prev_data,self.k)
v = Variable(v_data)
vh = Variable(vh_data.astype(np.float32))
'''
http://deeplearning.net/tutorial/rbm.html expression(5)From
Error function(loss)Is(Free energy of visible layer data)When(Visible layer data CD-k free energy)の差Whenなる。
loss = (self.free_energy(v) - self.free_energy(vh)) / batch_size
v->The mapping to vh is not included in the learning, so the CD-Variableization after k processing is completed
'''
loss = (self.free_energy(v) - self.free_energy(vh)) / batch_size
return loss
def free_energy(self,v):
''' Function to compute the free energy '''
'''
Input value v must be Variable
Originally, after taking SUM in row units, column units(Number of batches)It's a process that should take SUM
After all, I will take SUM, so SUM all at once
'''
batch_size = v.data.shape[0]
n_visible = self.n_visible
real = self.real
if real == 0:
'''
The visible layer[0,1]When
vbias_term = -1 * SUM((a(i) * v(i))
'''
vbias_term = F.sum(F.matmul(v,self.l.a))
else:
'''
When the visible layer is real
vbias_term = -0.5 * SUM((v(i)-a(i)) * (v(i)-a(i)))
Number of batches in chainer*To handle in the input layer, to subtract bias from each line
Forcibly use m * n
'''
m = Variable(xp.ones((batch_size,1),dtype=xp.float32))
n = F.reshape(self.l.a,(1,n_visible))
v_ = v - F.matmul(m,n)
vbias_term = -F.sum(0.5 * v_ * v_)
wx_b = self.l(v)
hidden_term = F.sum(F.log(1+F.exp(wx_b)))
return -vbias_term-hidden_term
def propup(self,vis):
'''
Calculate the probability distribution of hidden layers from visible layer data
The hidden layer[0,1]So it returns the probability of being 1.
Input value vis is always xp(np)
'''
pre_sigmoid_activation = xp.dot(vis,self.l.W.data.T) + self.l.b.data
return sigmoid(pre_sigmoid_activation)
def propdown(self,hid):
'''
Calculate the probability distribution of the visible layer from the hidden layer data
The visible layer[0,1]When, returns the probability of becoming 1.
When the visible layer is a real number, it returns the average of the normal distribution. Variance is fixed at 1.
Input value hid is always xp(np)
'''
real = self.real
if real == 0:
pre_sigmoid_activation = xp.dot(hid,self.l.W.data) + self.l.a.data
v_mean = sigmoid(pre_sigmoid_activation)
else:
v_mean = xp.dot(hid,self.l.W.data) + self.l.a.data
return v_mean
def sample_h_given_v(self,v0_sample):
'''
v0_sample → h1_Gibbs sampling sample
[0,1]Randomize the value of and compare it with the probability of 1 calculated by popup
h1_mean[i] > p[i]Then h1_sample[i] = 1
h1_mean[i] <= p[i]Then h1_sample[i] = 0
'''
h1_mean = self.propup(v0_sample)
h1_sample = xp.random.binomial(size=h1_mean.shape,n=1,p=h1_mean)
return h1_mean,h1_sample
def sample_v_given_h(self,h0_sample):
'''
h0_sample → v1_Gibbs sampling sample
The visible layer[0,1]in the case of
[0,1]The value of is a random value p[i]And compare it with the probability of 1 calculated by popdown
v1_mean[i] > p[i]Then v1_sample[i] = 1
v1_mean[i] <= p[i]Then v1_sample[i] = 0
When the visible layer is a real number
v1_sample[i]Is average v1_mean[i], Because it is a normal distribution with variance 1.
Random value u from normal distribution with mean 0, variance 1[i]And add the value calculated by popdown
v1_sample[i] = v1_mean[i] + u[i]
'''
v1_mean = self.propdown(h0_sample)
if self.real == 0:
v1_sample = xp.random.binomial(size=v1_mean.shape,n=1,p=v1_mean)
else:
batch_number = h0_sample.shape[0]
v1_sample = v1_mean + xp.random.randn(batch_number,self.n_visible)
return v1_mean,v1_sample
def gibbs_hvh(self,h0_sample):
''' h->v->gibbs sampling h'''
v1_mean,v1_sample = self.sample_v_given_h(h0_sample)
h1_mean,h1_sample = self.sample_h_given_v(v1_sample)
return v1_mean,v1_sample,h1_mean,h1_sample
def gibbs_vhv(self,v0_sample):
''' v->h->v gibbs sampling'''
h1_mean,h1_sample = self.sample_h_given_v(v0_sample)
v1_mean,v1_sample = self.sample_v_given_h(h1_sample)
return h1_mean,h1_sample,v1_mean,v1_sample
def constrastive_divergence(self,v0_sample,k=1):
''' CD-processing of k, cupy is required to make it GPU'''
vh_sample = v0_sample
for step in range(k):
ph_mean,ph_sample,vh_mean,vh_sample = self.gibbs_vhv(vh_sample)
return vh_sample
def reconstruct(self, v):
h = sigmoid(xp.dot(v,self.l.W.data.T) + self.l.b.data)
reconstructed_v = sigmoid(xp.dot(h,self.l.W.data) + self.l.a.data)
return reconstructed_v
```

In the case of chainer, you can use chainer.links.Linear to define the weight matrix (W) and bias (b) of the mapping of each layer. In the case of RBM, an additional bias is required. l.add_param("a",(n_visible),dtype=xp.float32) I added a parameter in. l.W: Weight matrix l.b:hbias l.a:vbaias It will be. To manage the mapping from the visible layer to the hidden layer with chainer.links.Linear Transposes the weight matrix found in a typical RBM book. (Wij → Wji)

I experimented with the following source code using MNIST data. Since there was no GPU server, it was trained by the CPU, so the number of trainings is 50 times. This time, the hidden layer is set to 500 dimensions, and the results with a learning coefficient of 0.01, momentum of 0.5, and PCD-10 are used.

The following is the original data. Actually there are 60,000 cases, but only 150 cases are displayed

The following is an image of the above image reconstructed. It is almost reproduced. Only items are displayed.

Next is an image that visualizes the weight matrix. The hidden layer is 500 dimensions and with respect to the basis of each vector It is converted to 28 * 28 images.

Recommended Posts