[PYTHON] Solve a simple traveling salesman problem using a Boltzmann machine with simulated annealing

: blue_car: Purpose of this article

By applying ** Simulated Annealing **, we solve the traveling salesman problem, which is one of the typical problems of combinatorial optimization problems. Furthermore, the Boltzmann machine model is adopted as the calculation model, and programming is performed in Python to find the optimum solution.

: blue_car: About Simulated Annealing Theory

The mathematical formulas and basic theory related to Simulated Annealing, which are taken up here, are explained in detail in the reference book "Optimization Algorithm" (P120-) described in the lower "Source", so I will omit them here. I would like to write an article mainly focusing on programming.

: blue_car: Simple traveling salesman problem

As a traveling salesman problem, we will use the same problem that we addressed here.

■ Solving the traveling salesman problem with the genetic algorithm (GA) and its library (vcopt) https://qiita.com/ttlabo/items/9b00a29702d1205f6996

Consider Table 1, where each row shows the city name and each column shows the order of visits, as shown below. If you visit cities in a certain order, the value in the table will be 1, otherwise the value will be 0. For example, in Table 1, the order of patrol is ** City E-> City A-> City D-> City C-> City B-> City E ** (finally returns to the city of departure). Table 2 also defines the distances between cities.

The Boltzmann machine model is applied to the Traveling Salesman Problem (TSP) that travels around these five cities to solve the problem.

: blue_car: Formula of the energy formula (Hamiltonian) of the objective function

Quoting from the reference book, the basic idea of Simulated Annealing is defined as follows.

** Basic concept of Simulated Annealing **

Objective function $ f (x) $: System energy (Hamiltonian) $ F (x) $ variable $ x $: The state of the system that changes according to a certain probability distribution Temperature parameter $ T $: Variable that controls the state transition probability

Regarding the above, gradually change the temperature parameter $ T $ from a high value to a low value. Search for the stable point (minimum value) of $ f (x) $.

Now we need to define the objective function Hamiltonian. Hamiltonian devises and formulates so that the optimum solution can be found when the value becomes the minimum. For example, add a term that increases the Hamiltonian value when you take an inappropriate route or violation condition when solving a problem. This term is called the penalty term.

In this issue, consider the following as Hamiltonian $ H $.

\begin{eqnarray}
Hamiltonian H&=& H_A + H_B\\
H_A &=&Term for finding the distance between cities\\
H_B &=&Penalty term
\end{eqnarray}

First, consider $ H_A $. Let the distance between the city $ i $ and the city $ j $ be $ d_ {i, j} $, and fit the variable $ x $ into each cell in Table 1 to make $ x_ {i, k} $. Where $ x_ {i, k} $ is the variable that determines whether to visit the $ k $ th city for the city $ i $. A variable that takes 1 if you visit and 0 if you don't, and is called a binary variable.

The total route can be calculated by adding the distance $ d_ {i, j} $ when the city $ i $ is visited in the $ k $ th place and the city $ j $ is visited in the $ k + 1 $ th place in order. That's fine, so define $ H_A $ as follows:

H_A = K_1\sum_{i=1}^{5}\sum_{j=1}^{5}\sum_{k=1}^{5}d_{i,j}x_{i,k}x_{j,k+1}… Ceremony(1)

In the above formula, $ K_1 $ is a constant. Be careful about the range of $ k $ in the third $ \ sum $ (sigma symbol). When $ k $ = 5, the $ x $ subject to sigma calculation will be $ x_ {i, 5} x_ {j, 6} $, and the sixth city to visit is required. In this issue, we will return to the first city we visited (first).

Here, since you cannot visit the same city in succession, set a large value for $ d_ {i, i} $ and penalize it so that it will not be selected as a solution due to the large value of $ H_A $. .. That is, set the value for $ d_ {1,1} $, $ d_ {2,2} $ ,,, $ d_ {5,5} $. Also, set the distance for the blank area at the bottom left of Table 2 by considering the combination between cities. You can set it like a symmetric matrix. Therefore, we redefine Table 2 as follows:

Next, consider $ H_B $. For Table 1, each row and column contains 1 number of 1s and 4 numbers of 0s. This means that you will not visit the same city multiple times or visit multiple cities in the same order at the same time. In other words, the sum for each row and each column is always 1. If it is not 1, it can be considered as a penalty term that increases the value, and $ H_B $ is set as follows.

H_B = K_2\Biggl[\sum_{i=1}^{5}\bigg(\sum_{k=1}^{5}x_{i,k}-1\bigg)^2 + \sum_{k=1}^{5}\bigg(\sum_{i=1}^{5}x_{i,k}-1\bigg)^2\Biggl]… Ceremony(2)

In the above formula, let $ K_2 $ be a constant.

Now that we have defined $ H_A $ and $ H_B $, the Hamiltonian $ H $ is:

H = K_1\sum_{i=1}^{5}\sum_{j=1}^{5}\sum_{k=1}^{5}d_{i,j}x_{i,k}x_{j,k+1} + K_2\Biggl[\sum_{i=1}^{5}\bigg(\sum_{k=1}^{5}x_{i,k}-1\bigg)^2 + \sum_{k=1}^{5}\bigg(\sum_{i=1}^{5}x_{i,k}-1\bigg)^2\Biggl]… Ceremony(3)

: blue_car: Think about the algorithm of the program

We will create an algorithm based on the reference book. First, consider the following formula, which is the basis of the Boltzmann machine model. The following formula is called the Gibbs-Boltzmann distribution, and is a formula that describes the behavior of particle systems such as temperature and gas in physics (statistical mechanics). $ f (x) $ is the energy of the system and $ T $ is the parameter corresponding to the temperature.

g(x,T) = \frac{1}{\sum_{x}^{}exp\big(\frac{-f(x)}{T}\big)}exp\big(\frac{-f(x)}{T}\big)… Ceremony(4)

In the calculation of the Boltzmann machine model, the temperature $ T $ is brought closer to 0. By approaching the temperature to 0, it is possible to approach one equilibrium state distribution by the Markov chain ergodic theorem (reference book P123), and the optimum solution can be obtained.

Here, the above temperature $ T $ is defined as a function of time $ t $ as follows. Instead of lowering the temperature, change $ t $ to perform the calculation.

T(t) = \frac{k}{ln(t+2)}… Ceremony(5)\\
(t=0,1,2,...)

In the above equation, $ k $ is a constant and $ ln $ is a natural logarithm.

The energy of the system (Hamiltonian) changes each time the temperature T is lowered, so when this energy changes, set an index to determine whether to incorporate that change. This indicator is called the acceptance probability. The acceptance probability is derived from the Gibbs-Boltzmann distribution equation (4) and the acceptance function $ F_B $ below as follows (reference book P129). Here, $ A (x, y) $ is the acceptance probability, $ E (x) $ or $ E (y) $ is the energy of the system (Hamiltonian), and $ g (x, T) $ is the Gibbs-Boltzmann distribution formula. will do. $ X $ is the state before the energy change, and $ y $ is the state after the energy change.

Acceptance probability A(x,y) = F_B\bigg(\frac{g(y,T)}{g(x,T)}\bigg)\\
= \frac{1}{1+exp\big(\frac{E(y)-E(x)}{T}\big)}… Ceremony(6)\\
Acceptance function F_B(s) = \frac{s}{1+s}

Now that we have the necessary expressions, we will build the algorithm.

Consider the initial conditions of the program. As mentioned above, the time $ t $ starts from 0, increments, and increases by 1. Increment is performed in the loop processing, and the number of loops is loop. $ t $ varies from 0 to loop-1.

Consider the initial state of the 2D array type variable $ x $ (it is dict type in the Python program). $ X $ is a variable that decides whether to visit each city or not, and is called a decision variable. It is a binary variable that takes 0 or 1, and this time it is considered as a 5x5 2D array (index number starts from 0).

Initialize this decision variable array $ x $. The initialization method should be set to 0 or 1 at random.

After the initialization process, the loop process starts. This looping process is equivalent to lowering the temperature.

The number of loops is held by the variable loop, but there is no specific number. Determined according to the operating status of the program.

First, get the temperature $ T $. The temperature $ T $ is determined by equation (5).

Randomly select one from the decision variable $ x $. Since $ x $ is a two-dimensional array of 5x5 matrices, we get random coordinates $ (i, j) $ in each of the x and y directions.

Inverts the randomly obtained $ x (i, j) $ value. Inversion means changing the value of $ x (i, j) $ to 0 if it is 1 and 1 if it is 0. Inverting this $ x (i, j) $ is called "changing the state". It calculates how the energy of the system (Hamiltonian) changes before and after changing the state, and decides whether to incorporate the change. In this program, a new two-dimensional array whose state has changed is secured as $ y $.

The probability of deciding whether to incorporate this change is called the acceptance probability (reference book P128). The acceptance probability is calculated by equation (6). The state before the change (system energy) is $ E (x) $, and the state after the change is $ E (y) $. The energy of the system (Hamiltonian) is calculated by Eq. (3).

If the above change exceeds a certain threshold, it will be accepted, and if it does not exceed it, it will not be accepted. Accepting means accepting the inversion of a randomly selected $ x (i, j) $ value, and not accepting means leaving the $ x (i, j) $ value as it is. In the program, the threshold is set with a variable called juri_hantei. Also, if accepted, transfer the 2D array $ y $ to $ x $.

After setting the changed state according to the acceptance probability, advance the time one step forward with $ t = t + 1 $.

Repeat the above loop. The final solution is $ x $ at the end of the loop.

: blue_car: Arrangement of algorithms

The above algorithms are arranged in procedural order as shown in the figure below.

`sa_simulation.py`


1.Variable definition
　
Loop count=Set to 50000.
Distance between cities di,Define j.
Hamiltonian constant K1,Define K2.
Constant k to determine temperature(Equation 5)To define.
　　k1,The values of k2 and k are determined while observing the movement of the program.
Acceptance judgment threshold based on acceptance probability juri_Define hantei.
　　juri_The value of hantei is determined by observing the movement of the program.
　
2.Initialization
　
Time t=Set to 0.
Variable x(A two-dimensional array)Randomly set 0 or 1 for each element of.
Define the functions used in the program.
　
3.Start loop processing
　
formula(5)Get more temperature Tp.
Randomly x[i,j]Select. 1D(y axis)Direction i, 2D(x axis)In each direction j
Randomly identify x[i,j]Select.
Same number of elements as x(Number of dimensions)With x[i,j]Only the value of(1 for 0, 0 for 1)And
Create a two-dimensional array y with other values the same as x.
Acceptance probability(Equation 6)Is calculated.
If the acceptance probability exceeds the threshold, y is substituted for x.(If not, leave x as it is and do nothing)
　t=t+Let it be 1.
Return to the starting position of the loop.
　
4.Get results
　
The finally obtained x is displayed on the screen as the optimum solution.
Calculate the total distance and display it on the screen.

: blue_car: Program implementation

The programming this time is as follows.

`sa_simulation.py`


import random
import math

# =====================================================================
# 1.Constant definition
# =====================================================================

#Number of loops
loop = 50000

#Definition of distance
d = [
    [100,  30,  30,  25,  10],
    [ 30, 100,  30,  45,  20],
    [ 30,  30, 100,  25,  20],
    [ 25,  45,  25, 100,  30],
    [ 10,  20,  20,  30, 100]
]

#1D of 2D array x(y direction)Number of elements of
len_x_y = 5
#2D of 2D array x(x direction)Number of elements of
len_x_x = 5

#Hamiltonian constant
k1 = 0.01
k2 = 1

#Constants for determining temperature
k = 5

#Threshold for judging whether or not to accept based on the acceptance probability
juri_hantei = 0.05


# =====================================================================
# 2.Initialization
# =====================================================================

#Time initialization
t = 0

#Randomly 0 for 2D array elements,Set to 1
x = {}
for i in range(len_x_y):
    for j in range(len_x_x):
        x[i,j] = random.randint(0,1)

# =====================================================================
#Function definition
# =====================================================================

#Hamiltonian
def H(x):
	HA = 0
	sumA = 0
	sumB = 0
	
	for i in range(len_x_y):
		for j in range(len_x_y):
			for k in range(len_x_x):
				k_from = k
				if k == len_x_x - 1:
					k_to = 0
				else:
					k_to = k + 1
				sumA += d[i][j] * x[i,k_from] * x[j, k_to]
	HA += k1 * sumA
    
	sumB1 = 0
	for i in range(len_x_y):
		for k in range(len_x_x):
			sumB1 += x[i,k]
		sumB += sumB1 * sumB1
		sumB1 = 0
		
	sumB2 = 0
	for i in range(len_x_y):
		for k in range(len_x_x):
			sumB2 += x[i,k]
		sumB -= 2 * sumB2
		sumB2 = 0

	for i in range(len_x_y):
		sumB += 1
	
	sumB3 = 0
	for k in range(len_x_x):
		for i in range(len_x_y):
			sumB3 += x[i,k]
		sumB += sumB3 * sumB3
		sumB3 = 0
		
	sumB4 = 0
	for k in range(len_x_x):
		for i in range(len_x_y):
			sumB4 += x[i,k]
		sumB -= 2 * sumB4
		sumB4 = 0
		
	for k in range(len_x_x):
		sumB += 1
	
	HA += k2 * sumB	
	return HA
    

#Definition of temperature function
def T(t):
    T = k / math.log(t + 2)
    return T

#Calculation of acceptance probability
def A(x,y,T):
    tmp = H(y) - H(x)
    A = 1 / (1 + math.exp(tmp / T))
    return A


# =====================================================================
# 3.Loop processing
# =====================================================================

while t < loop:
    
    #Get the temperature
    Tp = T(t)
    
    #Randomly select one from the two-dimensional array x(Random coordinates i,Get j)
    r_i = random.randint(0,len_x_x - 1)
    r_j = random.randint(0,len_x_x - 1)
    
    #Create a two-dimensional array y in state y with only one randomly selected coordinate inverted.
    y = {}
    for i in range(len_x_y):
        for j in range(len_x_x):
            if i == r_i and j == r_j:
                #Invert
                y[i,j] = -1 * (x[i,j] - 1)
            else:
                y[i,j] = x[i,j]
            
    #Calculate acceptance probability
    juri = round(A(x,y,Tp), 2)
    #Display time variables and acceptance probabilities
    print('t=',t,'juri=',juri)
    #Determine whether to accept based on the acceptance probability
    if juri >= juri_hantei:
        x = y
        print('▼ Accepted.')
    
    #Advance time t
    t = t + 1


# =====================================================================
# 4.Result display(Visited city)
# =====================================================================

print('-----Result display-------')
for i in range(len_x_y):
    line = ''
    for j in range(len_x_x):
        
        line += ' '
        if x[i,j] == 1:
            line += 'o'
        else:
            line += '-'
    
    print(line)

# =====================================================================
# 5.Result display(Total distance)
# =====================================================================

#List of cities to visit
city = []

for k in range(len_x_x):
    for i in range(len_x_y):
        if x[i,k] == 1:
            city.append(i)

#Add a route back from the last visited city to the first visited city
city.append(city[0])

#Calculate the total distance
td = 0
for c in range(len(city) - 1):
    td += d[city[c]][city[c+1]]

print('total distance:', td)

: blue_car: Verification of execution result

The solution obtained by Simulated Annealing (Simulated Annealing) is an approximate solution, not an exact solution. Iterative calculations are done to finally determine a better solution. This solution is called the optimal solution.

Therefore, the optimum solution obtained may differ slightly each time depending on the number of iterations. Repeat the iterative calculation several times and conclude that the most frequently occurring solution is the final optimal solution.

The optimal solution this time was as follows.

It turned out that the city's patrol order should be ** E → A → D → C → B → E ** starting from city E. By the way, if city A is the starting point, it will be A → D → C → B → E → A. The total distance is ** 110 **.

The most difficult points this time were ** determining variables k1 and k2 for Hamiltonian, variables k for temperature, and threshold juri_hantei **.

If the magnitude relationship between k1 and k2 is not adjusted properly, the penalty term will not work effectively, and the solution will have overlapping patrol routes and a small number of visited cities. If the variable k related to temperature is not set to an appropriate value, an overflow may occur in the processing of the exp function, or the acceptance probability may be too small to determine. By setting the threshold value juri_hantei to just the right value, the ideal form of the processing scenario is that the first half of the 50,000 loop processes accepts more state changes, and the second half accepts less because the Hamiltonian has converged. I was able to fit it in.

the end

: blue_car: Source

■ Reference book "Optimization algorithm" Tomoharu Nagao [Author] Shokodo

: blue_car: Related information

■ Solve a simple vertex cover problem with Blueqat https://qiita.com/ttlabo/items/b4ab222ef9c5ee99233c

■ An example of solving the knapsack problem with Blueqat https://qiita.com/ttlabo/items/537564c90056a6d56879

■ Solving the traveling salesman problem with the genetic algorithm (GA) and its library (vcopt) https://qiita.com/ttlabo/items/9b00a29702d1205f6996

■ Solving knapsack problems with Google's OR-Tools-Practice the basics of combinatorial optimization problems https://qiita.com/ttlabo/items/bcadc03004418f32f49d

■ [Part 1] What is optimization? --Study materials for learning mathematical optimization https://qiita.com/ttlabo/items/e6970c6e85cce9ff4e34

: blue_car: Opinions etc.

If you have any opinions or corrections, please let us know.