In Tensorflow, even if there is a place to find the diagonalization (eigenvalue) of the matrix in the middle when performing automatic differentiation by the gradient method, it seems that it can be minimized by the automatic differentiation and the gradient method without any problem.

I also added a method to manually ʻapply_gradient`using`

GradientTape` and a method to`

minimize` the loss function by automatically differentiating it.

Version

- Tensorflow 2.1.0
- Python 3.7.3 (Anaconda)
- Windows 10

When you prepare an appropriate 3-by-3 matrix $ A $ and calculate its eigenvalue $ \ lambda $, the derivative $ d \ lambda_0 / of each component of the matrix $ A $ with respect to the smallest eigenvalue $ \ lambda_0 $ When you want to calculate dA $ (it is a 3-by-3 matrix because it is for each component).

You can calculate $ d \ lambda_0 / dA $ with the following code. Since it is in Eager mode, use `GradientTape ()`

to grasp the calculation graph for differential calculation.

```
import tensorflow as tf
A = tf.random.uniform(shape=(3, 3))
with tf.GradientTape() as g:
g.watch(A)
val, vec = tf.linalg.eigh(A)
val0 = val[0]
grad0 = g.gradient(val0, A)
print(A)
print(val)
print(grad0)
```

`A`

```
tf.Tensor(
[[0.6102723 0.17637432 0.38962376]
[0.3735156 0.6306771 0.19141042]
[0.34370267 0.7677151 0.4024818 ]], shape=(3, 3), dtype=float32)
```

`val`

```
tf.Tensor([-0.25994763 0.34044334 1.5629349 ], shape=(3,), dtype=float32)
```

`grad0`

```
tf.Tensor(
[[ 5.3867564e-04 0.0000000e+00 0.0000000e+00]
[ 3.0038984e-02 4.1877732e-01 0.0000000e+00]
[-3.5372321e-02 -9.8626012e-01 5.8068389e-01]], shape=(3, 3), dtype=float32)
```

Finally, since only the lower half is non-zero, it is speculated that ʻeigh` is using only the lower half.

Actually, the slice operation is performed as `val0 = val [0]`

, which is the main point. In other words, it does more than just calculate the eigenvalues. `G.gradient (val, A)`

without cutting out one eigenvalue works fine, but ... I gave priority to the ease of explanation in mathematical formulas.

Also, `val0 = val [0]`

, but it is done in the `GradientTape ()`

environment. If all the operations that you want to connect the derivatives in this way are not done in this environment, `g.gradient ()`

will return `None`

.

Since it is an automatic differentiation, it is possible to insert operations before and after. I will try it from now on.

Prepare six values, arrange them in a symmetric matrix, and calculate the eigenvalues. Either this is true for a certain condition, or the error is calculated to be an error function, which is differentiated.

- Calculation before eigenvalue: Matrix composition
- Calculation after eigenvalue: Sum of error between eigenvalue and target value

By the way, like a machine learning library called Tensorflow, I will update the original value by the gradient method.

$ r $ holds 6 components of the 3x3 symmetric matrix, and $ t = (t_0, t_1, t_2) $ are the desired eigenvalues.

```
r = \left(r_0, r_1, \cdots, r_5\right)
```

Sort this

```
A_h=\left(\begin{array}{ccc}
\frac{1}{2} r_0 & 0 & 0 \\
r_3 & \frac{1}{2} r_1 & 0 \\
r_5 & r_4 & \frac{1}{2} r_2
\end{array}
\right)
```

```
A = A_h + A_h^T
```

Construct the matrix $ A $ as, and find the eigenvalues of it. Assuming that $ \ lambda = (\ lambda_0, \ lambda_1, \ lambda_2) $ is an array of three unique values of $ A $,

```
L = \sum_{i=0}^2\left(t_i - \lambda_i\right)^2
```

Is the loss function. To learn $ r $

```
\frac{\partial L}{\partial r_n} = \sum_{i,j,k}\frac{\partial A_{ij}}{\partial r_n}\frac{\partial \lambda_k}{\partial A_{ij}}\frac{\partial L}{\partial \lambda_k}
```

Is a necessary calculation. $ \ Frac {\ partial \ lambda_k} {\ partial A_ {ij}} $ in the middle is the derivative of the eigenvalues by the components of the matrix.

Analytical, it can be calculated by finding the solution $ \ lambda $ of $ \ det (A-\ lambda I) = 0 $, but honestly, if it exceeds 3D, it will be unmanageable unless it is a very sparse matrix. .. So, I rely on tensorflow, which can perform numerical calculations.

First of all, it's annoying, but how to follow the process.

```
r = tf.Variable(initial_value=tf.random.uniform(shape=[6]))
t = tf.constant([1.0, 1.0, 1.0])
```

`r`

holds 6 components of the 3x3 symmetric matrix, and` t`

is the desired eigenvalues. Here, a uniform random number is given to `r`

as the initial value. Also, `t`

is all 1.

```
with tf.GradientTape() as g:
g.watch(r)
A = 0.5*tf.linalg.diag(r[0:3])
A = A + tf.pad(tf.linalg.diag(r[3:5]), [[0, 1], [1, 0]], "CONSTANT")
A = A + tf.sparse.to_dense(tf.sparse.SparseTensor(indices=[[0, 2]], values=[r[5]], dense_shape=(3, 3)))
A = A + tf.transpose(A)
eigval, eigvec = tf.linalg.eigh(A)
d = tf.reduce_sum((eigval - t)**2)
```

Since the processing from here is the target of automatic differentiation, we will do it in `GradientTape`

.

There are two ways to substitute a specific component of ʻA` from`

r`, one is to use `

diag` for introduction and the other is to use`

SparseTensor`.

I used `tf.linalg.diag`

as a function to arrange three diagonal components and two components next to them.
However, `pad`

is used to arrange it in one place from the diagonal component, but the document introduces the option to arrange it in a place that is` k`

offset from the diagonal using the `k`

option. And when you use it

```
A = A + tf.linalg.diag(r[3:5], k=1)
```

I can write, but somehow the `k`

option did not work. So I use `pad`

on purpose.

Also, as a way to set only one of $ r_5 $ at the corner of the matrix, `Tensor`

does not have an index specification assign like numpy, so it is troublesome, but it is set by adding via a sparse matrix. Did.

After constructing ʻA`, it is normal to calculate diagonalization ʻeigh`

and loss function. Here, the loss function $ L $ is `d`

. I'm sorry it's complicated.

```
grad_dr = g.gradient(d, r)
```

Again, `d`

is the loss function $ L $.
A tensor of length 6 is assigned to `grad_dr`

. This has achieved the purpose of this article.

Use a suitable optimizer.

```
opt = tf.keras.optimizers.Adam()
opt.apply_gradients([(grad_dr, r), ])
```

Because differentiation can be used here

```
opt.minimize(d, var_list=[r])
```

Then

```
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object is not callable
```

I get the error. This is because the first argument `d`

is required to be a function with no arguments that returns a loss function, not` Tensor`

. This method will be described later.

For example, the initial `r`

```
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([0.12108588, 0.8856114 , 0.00449729, 0.22199583, 0.8411281 ,
0.54751956], dtype=float32)>
```

At that time, ʻA` is

```
tf.Tensor(
[[0.12108588 0.22199583 0.54751956]
[0.22199583 0.8856114 0.8411281 ]
[0.54751956 0.8411281 0.00449729]], shape=(3, 3), dtype=float32)
```

It will be. Then the derivative $ \ frac {dL} {dr} $ of the loss function is

```
tf.Tensor([-1.757829 -0.22877683 -1.991005 0.88798404 3.3645139 2.1900787 ], shape=(6,), dtype=float32)
```

Was calculated. If you apply this like ʻopt.apply_gradients ([(grad_dr, r),]) `, the value of`

r` will be

```
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([0.12208588, 0.8866114 , 0.00549729, 0.22099583, 0.8401281 ,
0.5465196 ], dtype=float32)>
```

You can see that it is slightly different from the first value `[0.12108588, 0.8856114, 0.00449729, 0.22199583, 0.8411281, 0.54751956]`

.

Why do you want to repeat it? If the optimization is successful, you will get a symmetric matrix with all 1 eigenvalues as set by `t`

.

```
while d > 1e-8:
with tf.GradientTape() as g:
g.watch(r)
A = 0.5*tf.linalg.diag(r[0:3])
A = A + tf.pad(tf.linalg.diag(r[3:5]), [[0, 1], [1, 0]], "CONSTANT")
A = A + tf.sparse.to_dense(tf.sparse.SparseTensor(indices=[[0, 2]], values=[r[5]], dense_shape=(3, 3)))
A = A + tf.transpose(A)
eigval, eigvec = tf.linalg.eigh(A)
d = tf.reduce_sum((eigval - t)**2)
grad_dr = g.gradient(d, r)
opt.apply_gradients([(grad_dr, r), ])
print("---------------")
print(r)
print(eigval)
print(d)
```

The `d`

is the square error, which is repeated in a` while`

loop until it falls below a certain level.
When you do this

```
---------------
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([0.10630785, 0.18287621, 0.14753745, 0.16277793, 0.7271476 ,
0.08771187], dtype=float32)>
tf.Tensor([-0.56813365 0.07035071 0.9315046 ], shape=(3,), dtype=float32)
tf.Tensor(3.3279824, shape=(), dtype=float32)
---------------
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([0.10730778, 0.18387613, 0.14853737, 0.16177836, 0.72614765,
0.0867127 ], dtype=float32)>
tf.Tensor([-0.5661403 0.07189684 0.9309651 ], shape=(3,), dtype=float32)
tf.Tensor(3.3189366, shape=(), dtype=float32)
---------------
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([0.10830763, 0.18487597, 0.1495372 , 0.1607792 , 0.72514784,
0.08571426], dtype=float32)>
tf.Tensor([-0.564147 0.07343995 0.9304282 ], shape=(3,), dtype=float32)
tf.Tensor(3.3099096, shape=(), dtype=float32)
The following is omitted
```

You can see that `r`

and ʻA` are changing little by little. When it converges,

```
<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=
array([ 9.9999946e-01, 9.9999988e-01, 9.9999732e-01, 6.9962436e-05,
4.2644251e-07, -1.1688111e-14], dtype=float32)>
tf.Tensor([0.9999294 0.9999973 1.0000702], shape=(3,), dtype=float32)
tf.Tensor(9.917631e-09, shape=(), dtype=float32)
```

have become. ʻA` seems to be an identity matrix. It was the same after several attempts, so it seems that finding a matrix with only one eigenvalue this way leads to an identity matrix.

If you count the number of loops, it seems that it takes thousands of steps such as 3066, 2341, 3035.

The point is to use `minimize ()`

.

For the construction of the symmetric matrix ʻA`, refer to the "Calculations for automatic differentiation" section above.

```
r = tf.Variable(initial_value=tf.random.uniform(shape=[6]))
t = tf.constant([1.0, 1.0, 1.0])
def calc_eigval(r):
A = 0.5*tf.linalg.diag(r[0:3])
A = A + tf.pad(tf.linalg.diag(r[3:5]), [[0, 1], [1, 0]], "CONSTANT")
A = A + tf.sparse.to_dense(tf.sparse.SparseTensor(indices=[[0, 2]], values=[r[5]], dense_shape=(3, 3)))
A = A + tf.transpose(A)
eigval, eigvec = tf.linalg.eigh(A)
return eigval
def calc_loss(r):
eigval = calc_eigval(r)
d = tf.reduce_sum((eigval - t)**2)
return d
opt = tf.keras.optimizers.Adam()
loss = lambda: calc_loss(r)
while d > 1e-8:
opt.minimize(loss, var_list=[r])
print("---------------")
print(r)
print(calc_eigval(r))
print(calc_loss(r))
```

`calc_eigval`

is a function that returns an eigenvalue, and` calc_loss`

is a function that calculates a loss function.

`loss`

is a function that calculates a loss function based on the value of` r`

at this point and returns its `Tensor`

, given`loss ()`

.
The first argument of `minimize`

requires a function without such an argument. So
I was getting an error when I passed the `d`

calculated by` gradientTape`

to the `minimize`

.

You can write `calc_eigval`

in` calc_loss`

, but I wanted to see how the eigenvalues changed during the loop, so I prepared another function.
`def calc_loss (r)`

and `loss = lambda: calc_loss (r)`

are the points for using `minimize`

. If you define `calc_loss`

from the beginning with no arguments, you can pass it to` minimize`

as it is.
Anyway, now I don't have to manage it myself.

If you execute the above code as it is with the `main`

function, the state of convergence will be output.

As a bonus, about 4 patterns that started the convergence of eigenvalues with random initial values. The lines are <font color = # 1f77b4> eigenvalue 1 </ font>, eigenvalue 2 </ font>, and <font color = # 2ca02c> eigenvalue 3 </ font>, respectively. The right end is 1, which is the goal of convergence, and you can see that it is working properly.

Higher-order differentiation is also possible by nesting `GradientTape ()`

. It seems that the coverage of automatic differentiation is wider than I expected.

The reason for the main article of `GradientTape`

is that I didn't know how to use` minimize`

at first and thought that I had to do ʻapply_gradient`manually. After writing most of the article, I knew how to do it with`

minimize`.

Recommended Posts