A saddle point is a point where the gradient is 0 and the minimum is taken from a certain direction and the maximum is taken from a certain direction. It is an important theme in various fields, as represented by GAN in machine learning and transition states in chemical reaction pathway search. GAN consists of a generator that generates data that looks exactly like the real thing, and a discriminator that discriminates between the real and the generated data. The generator learns to take the maximum, and the discriminator learns to take the minimum. In a chemical reaction, various stable raw materials pass through saddle points on the potential energy surface and change into more stable compounds. In this article, we will search for saddle points using the gradient method. This method is effective for simple models, but it behaves somewhat unstable for complex models (e.g. quantum chemistry, etc.), so I will add it as soon as it can be improved. If you are interested in this article, LGTM! please.
Describe the saddle point as follows. The objective function is $ y = f (\ boldsymbol {x}) $, $ \ boldsymbol {x} = (x_1, x_2, \ cdots, x_n) ^ \ rm {T} $, and the unit vector corresponding to the local minimum is $. \ boldsymbol {b} = (b_1, b_2, \ cdots, b_n) ^ \ rm {T} $, the unit vector corresponding to the maximum value is $ \ boldsymbol {c} = (c_1, c_2, \ cdots, c_n) ^ Defined as \ rm {T} $. At this time, the gradient of $ f $ becomes $ 0 $, and $ y = f (\ boldsymbol {x} + \ boldsymbol {b} t) $ takes the minimum value at $ t = 0 $ with respect to $ t $, and $ y The saddle point is that = f (\ boldsymbol {x} + \ boldsymbol {c} t) $ has a maximum value at $ t = 0 $ with respect to $ t $. It is expressed by a mathematical formula as follows.
At the saddle point $ \ boldsymbol {x} = \ boldsymbol {x} _0 $
\left.\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}\right |_{\boldsymbol{x}=\boldsymbol{x}_0}= 0\\
\left.\frac{\partial^2 f(\boldsymbol{x}+\boldsymbol{b}t)}{\partial t^2}\right |_{\boldsymbol{x}=\boldsymbol{x}_0, t=0} > 0\\
\left.\frac{\partial^2 f(\boldsymbol{x}+\boldsymbol{c}t)}{\partial t^2}\right |_{\boldsymbol{x}=\boldsymbol{x}_0, t=0} < 0
Is established.
Consider $ y = x_1 ^ 2-x_2 ^ 2 $ as an example. At $ x_1 = 0 $, $ x_2 = 0 $, the gradient is $ 0 $, and at $ \ boldsymbol {b} = (1, 0) ^ \ rm {T} $, $ y = (0 + 1 t) ^ 2- (0 + 0 t) ^ 2 = t ^ 2 $ takes the minimum value at $ t = 0 $, and $ y = at $ \ boldsymbol {c} = (0, 1) ^ \ rm {T} $ (0 + 0 t) ^ 2- (0 + 1t) ^ 2 = -t ^ 2 $ takes the maximum value at $ t = 0 $. Therefore, this point can be estimated to be a saddle point.
This method takes two steps: initialization and search. Initialization finds the appropriate initial values for $ \ boldsymbol {b} $ and $ \ boldsymbol {c} $. In the search, the saddle point is searched according to the gradient.
As mentioned earlier, $ \ boldsymbol {b} $ corresponds to a local maximum and $ \ boldsymbol {c} $ corresponds to a local minimum. First, determine $ \ boldsymbol {b} $ that maximizes the second derivative of $ y = f (\ boldsymbol {x} + \ boldsymbol {b} t) $ with respect to $ t $. For simplicity, we use the iterative method of re-descent. It is expressed by a mathematical formula as follows.
\mathrm{grad}\ \boldsymbol{b}_1 \leftarrow \frac{1}{\delta} \left( \nabla f \left(\boldsymbol{x}+\delta \boldsymbol{b} \right) - \nabla f \left(\boldsymbol{x}-\delta \boldsymbol{b} \right) \right)\\
\mathrm{grad}\ \boldsymbol{b}_2 \leftarrow \mathrm{grad}\ \boldsymbol{b}_1 - \left( \boldsymbol{b} \cdot \mathrm{grad}\ \boldsymbol{b}_1 \right)\boldsymbol{b}\\
\boldsymbol{b} \leftarrow \boldsymbol{b} + \epsilon_1 \ \mathrm{grad}\ \boldsymbol{b}_2\\
\boldsymbol{b} \leftarrow \frac{\boldsymbol{b}}{\mathrm{norm} \left(\boldsymbol{b}\right)}\\
The minute amount is $ \ delta $ and the learning rate is $ \ epsilon_1 $. The first equation is the derivative of the $ n $ dimensional Euclidean space with respect to the basis. As a device, the formula was transformed so that the gradient could be used. If this is used as the update amount as it is, it is not appropriate for the unit vector $ \ boldsymbol {b} $, so it is necessary to convert it to the update amount of $ \ boldsymbol {b} $ along the unit sphere in the second equation. there is. The third formula is updated toward the maximum value, and the fourth formula is standardized to make it a unit vector. Similarly, determine $ \ boldsymbol {c} $ such that the second derivative of $ y = f (\ boldsymbol {x} + \ boldsymbol {c} t) $ with respect to $ t $ is maximized. It is expressed by a mathematical formula as follows.
\mathrm{grad}\ \boldsymbol{c}_1 \leftarrow \frac{1}{\delta} \left( \nabla f \left(\boldsymbol{x}+\delta \boldsymbol{c} \right) - \nabla f \left(\boldsymbol{x}-\delta \boldsymbol{c} \right) \right)\\
\mathrm{grad}\ \boldsymbol{c}_2 \leftarrow \mathrm{grad}\ \boldsymbol{c}_1 - \left( \boldsymbol{c} \cdot \mathrm{grad}\ \boldsymbol{c}_1 \right)\boldsymbol{c}\\
\boldsymbol{c} \leftarrow \boldsymbol{c} - \epsilon_1 \ \mathrm{grad}\ \boldsymbol{c}_2\\
\boldsymbol{c} \leftarrow \frac{\boldsymbol{c}}{\mathrm{norm} \left(\boldsymbol{c}\right)}
The difference from $ \ boldsymbol {b} $ is that it is updated in the direction of the minimum value in the third equation. You can use $ \ mathrm {grad} \ \ boldsymbol {b} \ _2 $ and $ \ mathrm {grad} \ \ boldsymbol {c} \ _2 $ as convergence tests.
The search uses $ \ mathrm {grad} \ \ boldsymbol {b} \ _2 $ and $ \ mathrm {grad} \ \ boldsymbol {c} \ _2 $. This is because there is a minimum value in the direction of $-\ mathrm {grad} \ \ boldsymbol {b} \ _2 $ and a maximum value in the direction of $ \ mathrm {grad} \ \ boldsymbol {c} \ _2 $. You can reach the saddle point by updating according to. It is expressed by a mathematical formula as follows.
\boldsymbol{x} \leftarrow \boldsymbol{x} + \epsilon_2 \ \left( -\mathrm{grad}\ \boldsymbol{b}_2 +\mathrm{grad}\ \boldsymbol{c}_2 \right)
Initialization also because $ \ mathrm {grad} \ \ boldsymbol {b} _2 $ and $ \ mathrm {grad} \ \ boldsymbol {c} _2 $ have changed as $ \ boldsymbol {x} $ has been updated. You must also do $ \ boldsymbol {b} $ and $ \ boldsymbol {c} $ that you did in.
The saddle points of various functions were derived according to the above algorithm. The update method used the re-descent method and was executed in Python. The red line is the minimum value vector and the green line is the maximum value vector.
・ Function $ y = x_1 ^ 2-x_2 ^ 2 $
Initial value $ x_1 = -2 $, $ x_2 = -1 $
Small amount $ \ delta = 0.001 $
Learning rate $ \ epsilon_1 = 0.1 $, $ \ epsilon_1 = 0.1 $
result
Saddle point $ x_1 = -0.0003 $, $ x_2 = -0.001 $
Initialization
Saddle point search
・ Function $ y = x_1 ^ 2 + x_2 ^ 3-x_2 $
Initial value $ x_1 = -2 $, $ x_2 = -0.5 $
Small amount $ \ delta = 0.05 $
Learning rate $ \ epsilon_1 = 0.1 $, $ \ epsilon_1 = 0.1 $
result
Saddle point $ x_1 = -0.0011 $, $ x_2 = -0.5774 $
Initialization
Saddle point search
As an improvement, since it is a gradient method, various optimizers (e.g. Momentum, RMSProp, Adam, RAdam) and wegstain methods in conjugate gradient method and machine learning can be used. The problem is that it is not stable. I wondered if it could be used for saddle point search that can be used for chemical reaction prediction, but it converged with a strange structure. Also, depending on the initial value, it may not converge to the saddle point. An example is shown below. This is different from the initial value in the previous example.
・ Function $ y = x_1 ^ 2 + x_2 ^ 3-x_2 $ Initial value $ x_1 = -2 $, $ x_2 = -0.5 $ Small amount $ \ delta = 0.05 $ Learning rate $ \ epsilon_1 = 0.1 $, $ \ epsilon_1 = 0.1 $
Initialization
Saddle point search
In this example, $ \ boldsymbol {b} $ and $ \ boldsymbol {c} $ have been updated so that the direction of $ x_2 $ has a local minimum and the direction of $ x_1 $ has a local maximum. Therefore, while updating toward the minimum value of the cubic function, we are climbing the slope of the quadratic function.
In this article, we performed a saddle point search using the gradient method. An example of a two-variable function is shown for visibility, but you can actually use an arbitrary variable function. If you have any questions, we will reply in the comments. Feel free to comment if you have any requests for formula transformations or source code.
There are settings such as functions, initial values, and learning rate in "Calculation 1 Initialization". You can change the contents of the Class SDG to various optimizers.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from datetime import datetime
np.set_printoptions(precision=4, floatmode='maxprec')
class SDG:
def __init__(self, learning_rate):
self.learning_rate = learning_rate
def solve(self, coord, gradient):
return coord - self.learning_rate*gradient
class SaddlePoint:
def __init__(self, function, gradient, coordinate, delta=1.0e-3,
learning_rate=0.5, max_iteration=1000,
judgment1=1.0e-8, judgment2=1.0e-8, log_span=100):
"""
Initialization constructor
function --Objective function
differential --First derivative of the function
coordinates --Initial coordinates
delta --Small value
learning_rate --Learning rate
judgment --Convergence test 1
judgment --Convergence test 2
log_span --Log display interval
"""
self.function = function
self.gradient = gradient
self.coordinate = coordinate.copy()
self.dim = len(coordinate)
self.delta = delta
self.learning_rate = learning_rate
self.max_iteration = max_iteration
self.judgment1 = judgment1
self.judgment2 = judgment2
self.log_span = log_span
#Basis vector
self.b = np.random.rand(self.dim)
self.b /= np.linalg.norm(self.b)
self.c = np.random.rand(self.dim)
self.c /= np.linalg.norm(self.c)
# SDG
self.sdg_b_init = SDG(learning_rate)
self.sdg_c_init = SDG(learning_rate)
self.sdg_b_solv = SDG(learning_rate)
self.sdg_c_solv = SDG(learning_rate)
self.sdg_a_solv = SDG(learning_rate)
def initialize(self):
"""
The function to initialize. Appropriate b,Determine c
Return value--Minimum value direction vector b,Maxima direction vector c
"""
#Slope
gradient = self.gradient
#Coordinate b
coordinate = self.coordinate
#Basis vector
b = self.b.copy()
c = self.c.copy()
#Update amount
diff_b = np.zeros_like(b)
diff_c = np.zeros_like(c)
#Learning rate
learning_rate = self.learning_rate
#Small value
delta = self.delta
#Standardization
norm = np.linalg.norm
#Judgment
judgement1 = self.judgment1
#Log interval
log_span = self.log_span
# SDG
sdg_b = self.sdg_b_init
sdg_c = self.sdg_c_init
z, _ = gradient(coordinate)
print("-----Initialization of b has started.-----")
for i in range(self.max_iteration):
#First derivative
z_b1, grad_b1 = gradient(coordinate + delta*b)
z_b2, grad_b2 = gradient(coordinate - delta*b)
#Change amount calculation
nabla_b = (grad_b1 - grad_b2)/delta
grad_b = nabla_b - (np.dot(b, nabla_b))*b
#update
b = sdg_b.solve(b, -grad_b)
#Standardization
b /= norm(b)
#Convergence test
error = np.linalg.norm(grad_b)
if i%log_span == 0:
print("Iteration = {}, Error = {}".format(i, error))
if error < judgement1:
print("Converged! Iteration = {}, Error = {}".format(i, error))
break
self.b = b.copy()
print()
print("-----Initialization of c has started.-----")
for i in range(self.max_iteration):
#Gradient calculation
z_c1, grad_c1 = gradient(coordinate + delta*c)
z_c2, grad_c2 = gradient(coordinate - delta*c)
#Change amount calculation
nabla_c = (grad_c1 - grad_c2)/delta
grad_c = nabla_c - (np.dot(c, nabla_c))*c
#update
c = sdg_c.solve(c, grad_c)
#Standardization
c /= norm(c)
#Convergence test
error = np.linalg.norm(grad_c)
if i%log_span == 0:
print("Iteration = {}, Error = {}".format(i, error))
if error < judgement1:
print("Converged! Iteration = {}, Error = {}".format(i, error))
break
self.c = c.copy()
print()
print("Result")
print("b = {}".format(self.b))
print("c = {}".format(self.c))
print()
return self.b, self.c
def solve(self):
"""
Search for saddle points
Return value--Saddle point coordinates coordinate,Minimum value direction vector b,Maxima direction vector c
"""
#Slope
gradient = self.gradient
#Coordinates, a collection of coordinates
coordinate = self.coordinate.copy()
coordinate_Array = coordinate.copy()
#Basis vector
b = self.b.copy()
c = self.c.copy()
#Update amount
diff_b = np.zeros_like(b)
diff_c = np.zeros_like(c)
#Learning rate
learning_rate = self.learning_rate
#Small value
delta = self.delta
#Standardization
norm = np.linalg.norm
#Judgment
judgement1 = self.judgment1
judgement2 = self.judgment2
#Log interval
log_span = self.log_span
# SDG
sdg_a = self.sdg_a_solv
sdg_b = self.sdg_b_solv
sdg_c = self.sdg_c_solv
print("-----Saddle-point solver has started.-----")
for i in range(self.max_iteration):
#First derivative
z_b1, grad_b1 = gradient(coordinate + delta*b)
z_b2, grad_b2 = gradient(coordinate - delta*b)
z_c1, grad_c1 = gradient(coordinate + delta*c)
z_c2, grad_c2 = gradient(coordinate - delta*c)
grad_through_b = (z_b1-z_b2) / (2.0*delta)
grad_through_c = (z_c1-z_c2) / (2.0*delta)
#Second derivative
z, _ = gradient(coordinate)
grad2_through_b = (z_b1-2.0*z+z_b2) / delta**2.0
grad2_through_c = (z_c1-2.0*z+z_c2) / delta**2.0
#update
# coordinate = sdg_a.solve(coordinate,
# grad_through_b*b/(np.linalg.norm(grad_through_b)+np.linalg.norm(grad2_through_b))
# -grad_through_c*c/(np.linalg.norm(grad_through_c)+np.linalg.norm(grad2_through_c)))
coordinate = sdg_a.solve(coordinate, grad_through_b*b - grad_through_c*c)
coordinate_Array = np.vstack([coordinate_Array, coordinate])
#Convergence test
error_coordinate = np.linalg.norm(grad_through_b**2 + grad_through_c**2)
# b,update of c
nabla_b = -(grad_b1 - grad_b2)/delta
grad_b = nabla_b - (np.dot(b, nabla_b))*b
#update
b = sdg_b.solve(b, grad_b)
#Standardization
b /= norm(b)
#Convergence test
error_b = np.linalg.norm(grad_b)
nabla_c = (grad_c1 - grad_c2)/delta
grad_c = nabla_c - (np.dot(c, nabla_c))*c
#update
c = sdg_c.solve(c, grad_c)
#Standardization
c /= norm(c)
#Convergence test
error_c = np.linalg.norm(grad_c)
if i%log_span == 0:
print("B converged! Iteration = {}, Error = {}".format(i, error_b))
print("C converged! Iteration = {}, Error = {}".format(i, error_c))
print("Iteration = {}, Error = {}".format(i, error_coordinate))
print()
if error_coordinate < judgement2:
print("Converged! Iteration = {}, Error = {}".format(i, error_coordinate))
break
self.coordinate = coordinate.copy()
self.b = b.copy()
self.c = c.copy()
print()
print("Result")
print("coordinate = {}".format(self.coordinate))
print("b = {}".format(self.b))
print("c = {}".format(self.c))
print()
return self.coordinate, coordinate_Array, self.b, self.c
# =============================================================================
#Calculation 1 Initialization
# =============================================================================
def f(x):
#function
return x[0]**2 - x[1]**2
def gradient_f(x):
#First derivative of the function
return f(x), np.array([2*x[0], -2*x[1]])
x_init = np.array([-2.0, -1.0], dtype="float")
saddlePoint = SaddlePoint(f, gradient_f, x_init, delta=1e-3,
learning_rate=0.1, max_iteration=100,
judgment1=1.0e-5, judgment2=1.0e-5, log_span=1)
b, c = saddlePoint.initialize() #Initialization
# =============================================================================
#Graph drawing(2D)
# =============================================================================
t = np.linspace(-1.0, 1.0, 100)
tb = np.linspace(x_init-1.0*b, x_init+1.0*b, 100)
tc = np.linspace(x_init-1.0*c, x_init+1.0*c, 100)
fb = f(tb.T)
fc = f(tc.T)
plt.xlabel("t")
plt.ylabel("z")
plt.plot(t, fb, c="red")
plt.plot(t, fc, c="green")
plt.savefig("file/" + str(datetime.now().strftime("%Y_%m_%d_%H_%M_%S")) + "_2d1.png ", dpi=300)
plt.show()
# =============================================================================
#Graph drawing(3D)
# =============================================================================
#Wire frame
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
X = np.linspace(-2.5, 2.5, 50)
Y = np.linspace(-2.5, 2.5, 50)
X, Y = np.meshgrid(X, Y)
Z = f([X,Y])
#Basis vector
width = 1.0
bt = np.linspace(x_init-width*b,x_init+width*b,10)
bz = f(bt.T)
ct = np.linspace(x_init-width*c,x_init+width*c,10)
cz = f(ct.T)
#Trajectory
zArray = f(x_init)
#display
ax.plot_wireframe(X,Y,Z,color="gray",linewidth=0.2) #Wire frame
ax.plot(bt[:,0],bt[:,1],bz,color="red") #Minimal
ax.plot(ct[:,0],ct[:,1],cz,color="green") #maximum
ax.scatter(x_init[0],x_init[1],f(x_init),color="blue") #Trajectory
plt.savefig("file/" + str(datetime.now().strftime("%Y_%m_%d_%H_%M_%S")) + "_3d1.png ", dpi=300)
plt.show()
# =============================================================================
#Calculation 2 Saddle point search
# =============================================================================
x, xArray, b, c = saddlePoint.solve() #Saddle point calculation
# =============================================================================
#Graph drawing(2D)
# =============================================================================
t = np.linspace(-1.0, 1.0, 100)
tb = np.linspace(x-1.0*b, x+1.0*b, 100)
tc = np.linspace(x-1.0*c, x+1.0*c, 100)
fb = f(tb.T)
fc = f(tc.T)
plt.xlabel("t")
plt.ylabel("z")
plt.plot(t, fb, c="red")
plt.plot(t, fc, c="green")
plt.savefig("file/" + str(datetime.now().strftime("%Y_%m_%d_%H_%M_%S")) + "_2d2.png ", dpi=300)
plt.show()
# =============================================================================
#Graph drawing(2D)
# =============================================================================
#Wire frame
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
X = np.linspace(-2.5, 2.5, 50)
Y = np.linspace(-2.5, 2.5, 50)
X, Y = np.meshgrid(X, Y)
Z = f([X,Y])
#Basis vector
width = 1.0
bt = np.linspace(x-width*b,x+width*b,10)
bz = f(bt.T)
ct = np.linspace(x-width*c,x+width*c,10)
cz = f(ct.T)
#Trajectory
zArray = f(xArray.T)
#display
ax.plot_wireframe(X,Y,Z,color="gray",linewidth=0.2) #Wire frame
ax.plot(bt[:,0],bt[:,1],bz,color="red") #Minimal
ax.plot(ct[:,0],ct[:,1],cz,color="green") #maximum
ax.scatter(xArray[:,0],xArray[:,1],zArray,color="blue") #Trajectory
ax.text(xArray[0,0],xArray[0,1],zArray[0], "start")
ax.text(xArray[-1,0],xArray[-1,1],zArray[-1], "goal")
plt.savefig("file/" + str(datetime.now().strftime("%Y_%m_%d_%H_%M_%S")) + "_3d2.png ", dpi=300)
plt.show()
Recommended Posts