This article is the 18th day article of NTT Communications Advent Calendar 2019. Yesterday was @ yusuke84's article, Thinking about Support for WebRTC Platform SkyWay. **Merry Christmas! **: santa :: Christmas_tree :: gift :: santa :: christmas_tree :: gift :: santa :: christmas_tree :: gift :: santa :: Christmas_tree :: gift :: santa :: Christmas_tree :: gift :: santa :: : Christmastmas_tree :: gift ::

Introduction

Since it's the company's Advent Calendar, let's write some technical tips at first! Thinking about it, I was looking for a story about Neural Network, but in the end I settled on a story that I was interested in, which was more theoretical than implementation, and this article is the end of that. (Well, it's okay for one person to run away, I'm sure)

So, I will write a story that I have been interested in for a while about the expression of physical systems using Neural Network. In the meantime, I will make a novel method into a paper starting from the story of this area, so I am delusional that I would like to receive a corresponding evaluation from the company. (Glitter

I'm still expecting it (flickering)

Who are you! ??

I am a second-year employee who usually researches and develops basic technologies that contribute to data science and AI in the Technology Development Department. This is a post from @ BootCamp_2019 because the real dirt (almost ROM-only) is more contaminated.

As you can see from this name, I was a lecturer of this year's data science & AI course of our in-house training "Boot Camp".

After that, every time there is something, "statistical test 1st grade" "mathematical test 1st grade" Or, he is showing off his qualifications that are useless in the research industry.

The company mainly researches and develops multivariate time series analysis, sparse estimation, factor analysis, etc., but personally I like nonlinear mechanics, contraction theory, and information statistical mechanics.

As a result, this article is also based on my personal taste. (Tehepero

Motivated

Recently, even for some time, the application of deep learning has been discussed in various fields. This is a very pleasing thing, and for me, who was originally in the theory of mathematical science, it is a great motivation to get serious about deep learning. Is it rumored that incorporating theories and schemes that are valued in different fields into Deep Learning has entered a period of disillusionment? It is synonymous with taking a new wind into the deep learning that is done, and I think it is a wonderful thing that you can explore the next evolution from many perspectives.

This article was written with the hope that you will sympathize with that.

I intend to summarize the model and explanation as simple as possible, so I would appreciate it if you think the atmosphere is interesting. (If you are familiar with it, please tell me the Hamiltonian mechanics of the many-body phase oscillator model. I am most interested in it now.)

Dynamical system representation using Neural Network

Neural Network and differential equations

The world is made up of differential equations.

For example, the equation of motion of a spring simple vibration learned in high school physics is described as follows. (Suppose $ m $ is the mass, $ a $ is the acceleration, $ k $ is the spring constant, $ x $ is the position of the object, and the equilibrium position is the origin, a simple one-dimensional motion.) $ ma = -kx $ [^1] [^ 1]: Partially quoted from Unknown solution of physics equation of motion [simple vibration]. Using simple vibration as an example, we show and compare multifaceted solutions such as Taylor expansion, exponential function, and conserved quantity.

In the future, if we dig a little deeper and learn, we learned that the velocity $ v $ of an object is the first derivative of position $ x $, and the acceleration $ a $ means the first derivative of velocity $ v $. I will. That is, the acceleration $ a $ means the second derivative of the position $ x $. By clearly stating that the position is a function of time $ x (t) $ and describing it carefully, the equation of motion can be rewritten as follows. $ m\frac{d^2x(t)}{dt^2} = -kx(t) $

A differential equation is a functional equation that is described as a relational expression between an unknown function and its derivative. The equation of motion of the spring simple vibration is also a differential equation.

A better mathematical model is essential for a better theoretical interpretation of physical phenomena. By using differential equations, a real-world phenomenon can be formulated as a mathematical problem (based on some assumptions and approximations), and by interpreting the meaning of the solution to the problem, the original phenomenon itself You can give a theoretical interpretation.

For example, if you mathematically solve the differential equation that expresses the spring simple vibration, you will get the following solution. (However, $ A $ is the amplitude and $ \ alpha $ is the initial phase, both of which are determined from the initial position $ x (0) $ and the initial velocity $ v (0) .) $ x(t)=A\sin\left(\sqrt{\frac{k}{m}} t +\alpha\right) $$ And this solution can be seen to match the actual simple vibration behavior perfectly when plotted on the plane of time $ t $ and position $ x $ as shown in the figure below.

[^2]

[^ 2]: Examination memo The most difficult part of mechanics! What is simple vibration? A thorough commentary by the University of Tokyo graduate students![High school physics]. It covers the theory of simple vibration evenly, and the figure is easy to understand.

In fact, the formulation of mathematical models using differential equations is useful not only for physical phenomena but also for a wide range of phenomena such as biology, chemistry, politics, economy, and society. In other words, phenomena in the world can be expressed by differential equations, which is surprising, and conversely, the discovery of a valid differential equation can be said to be a problem equivalent to the theoretical interpretation of the phenomenon. (You should have a dream and say it!)

However, unfortunately, in higher education in Japan, although we teach how to solve differential equations, we do not teach much how to make differential equations. While it's easy for anyone to solve with the right tools, it's very difficult to create a new one, and it's much more important than solving it. If you are interested in making differential equations, this book [^ 3] is recommended. It is a good book with a large number of application examples and is ideal for training model building.

[^ 3]: Let's make a mathematical model with differential equations, Nihon Hyoronsha (1990/4/9).

Differential equations and ResNet

I forgot to mention that the differential equations dealt with in this article are strictly a group called ordinary differential equations. But what is an ordinary differential equation? Since it will be long, I will give it to another place, and since it is difficult to mend, I will continue to call ordinary differential equations as differential equations.

Now, the basic first-order differential equation is generally given in the following format. $ \frac{dx(t)}{dt}=f(x(t)) $ By the way, the derivative can be obtained in the following limit from the definition. $ \frac{dx(t)}{dt}=\lim_{\Delta t\to0}\frac{x(t+\Delta t)-x(t)}{\Delta t} $ Therefore, if you want to simulate the behavior of a differential equation with a computer, the simplest way is to discretize the time at intervals of $ \ Delta t $ and perform numerical calculation based on the discrete time $ t_n = n \ Delta t . I will. $ \frac{x(t_{n+1})-x(t_n)}{\Delta t}=f(x(t_n)) $ This is transformed a little. $ x(t_{n+1})=x(t_n)+f(x(t_n))\Delta t $$ And although it is a little amakudari, if you replace it with $ g (x (t_n)) = f (x (t_n)) \ Delta t , you can finally express it as follows. $ x(t_{n+1})=x(t_n)+g(x(t_n)) $$ Have you ever seen a deep learning practitioner with this format and a good understanding? In fact, if you stare at it, you can see it as an equation of the same format as ResNet [^ 4], which prevailed in the image recognition task.

On the contrary, from the neural network side, I will try to reach this equation step by step.

The following is a diagram taken from Figure 2 of ResNet [^ 4].

At each layer of any architecture, after all, for some input $ x $, a flexible non-linear transformation (such as the Afiine transformation) and a non-linear transformation (such as the activation function) are appropriately combined. The conversion $ \ mathcal {F} (x) $ is output and propagated to the next layer. The propagation path of the central vertical arrow in the figure illustrates this exactly. (The figure says Relu, but in general, other than Relu is also possible.)

The heart of ResNet is on the right side of the propagation path (detour) marked "$ x $ identity". Details are left to the original paper [^ 4], but this propagation path (detour) is an identity map, and the input is merged with the output as it is. As a result, it is known that learning proceeds efficiently even in a network with a very large number of layers, and it is thought that the reason is that error back propagation proceeds efficiently.

Here, if the input is $ x_n $ and the output is $ x_ {n + 1} $ for the $ n $ th ResNet block, the previous figure can be rewritten as follows.

x_{n+1}=x_n+\mathcal{F}^{(n)}(x_n)

It can be said that this is exactly the same type of equation as the discretized differential equation itself shown earlier. To be more explicit, it would be easier to understand if $ x (t_n) $ in the differential equation and $ x_n $ in the ResNet block correspond.

Furthermore, in RevNet [^ 5], which is known as a generalization of ResNet, each block has the following symmetrical structure.

\begin{eqnarray} x_{n+1}&=&x_n+\mathcal{F}(y_n)\\\ y_{n+1}&=&y_n+\mathcal{G}(x_n) \end{eqnarray}

Considering the same argument as before, we can see that this is a discretization of the following simultaneous differential equations. $ \begin{eqnarray} \frac{dx(t)}{dt}&=&f(y(t))\\\ \frac{dy(t)}{dt}&=&g(x(t)) \end{eqnarray} $

From the discussion so far, it should be noted that not all differential equations can be described in ResNet (or RevNet). For more information about this area, see this book [^ 6], which has been popular recently. This book is a very good book, so if you want to explore the deep relationship between deep learning and physics, I would be grateful if you could pick it up. Especially in this article, for the sake of simplification of the discussion, we have cut off the consideration about the number of units, so please check the details in this book etc.

[^ 6]: Understanding the principles of deep learning and physics, can be applied, Kodansha (2019/6/22) ).

ODENet (Neural Ordinary Differential Equations) This paper [^ 8], which was selected as the best paper of NeurIPS 2018 [^ 7], proposes an epoch-making method that connects the time evolution equation driven by the differential equation and the neural network. The author implementation is published on Github [^ 9].

ODENet considers the processing of each layer such as the hidden layer of ResNet and RNN as a time evolution equation with a continuous limit in the time direction, and constructs a neural network by explicitly solving it as an (ordinary) differential equation. is. By making it continuous, the concept of deep learning layer is eliminated, memory and calculation amount are efficient, and as an optimization equivalent to Back Propagation, a solver of (ordinary) differential equations can be used, and various innovative things. We are proposing a method.

This is a diagram taken from Figure 1 of ODENet [^ 8]. The figure on the left describes discrete time evolution with normal ResNet, while ODENet on the right can represent continuous time evolution. There is no longer the concept of discrete layers. (It is perfect if you feel that there are many atmosphere arrows (vector fields).) 68747470733a2f2f692e696d6775722e636f6d2f753650645337522e706e67.png

In addition, the probability density can be learned efficiently with the continuous normalizing flow (CNF), which is an extension of the normalizing flow ^ 10, and the time series data is made more efficient by continuing the time evolution of latent variables. A method of learning is proposed.

Please refer to this site [^ 11] [^ 12] [^ 13] for a detailed theoretical explanation of ODENet.

As an aside, the code is published in this Normalizing Flow, another repository of the same author [^ 14]. It's fun to actually construct a strange probability density.

For example, the probability density of a guy who always watches over us warmly can be calculated fairly neatly like this.

[^ 7]: Neural Information Processing Systems, NeuroIPS for short. Nowadays, the old abbreviation NIPS is still more suitable.

Neural Network and Hamiltonian mechanics

What is Hamiltonian mechanics?

Hamiltonian mechanics is a form of analytical mechanics that analyzes phenomena using a mysterious characteristic function called Hamiltonian. Hamiltonian is a physical quantity equivalent to energy in physics, and many properties of physical systems can be described by Hamiltonian. (Hamiltonian is not necessarily a concept limited to physical phenomena, but it is easier to imagine the following discussions if the subject is physical phenomena, so I say energy here.)

Well, I will shelve the difficult story and think that it is a physical quantity equivalent to the total energy of high school physics. That is, when Hamiltonian and kinetic energy & potential energy are expressed as $ \ mathcal {H}, \ mathcal {K}, \ mathcal {U} $, respectively, they are related as follows.

\mathcal{H}=\mathcal{K}+\mathcal{U}

Another important concept is that Hamiltonian mechanics introduces generalized coordinates and generalized momentum. However, it is sufficient here to simply think of generalized coordinates as normal coordinates and generalized momentum as normal momentum (product of mass and velocity). If these are expressed as $ q and p $, respectively, they can be expressed in the following relationship with ordinary physical quantities.

\begin{eqnarray} q(t)&=&x(t)\\\ p(t)&=&mv(t) \end{eqnarray}

It's very amakudari, but I think that there is such a crazy notation, and I will actually find the Hamiltonian about the differential equation of the spring simple vibration.

This is very easy, it is enough to rewrite the kinetic energy and potential energy using generalized coordinates and generalized momentum, respectively, as follows. $ \begin{eqnarray} \mathcal{K}&=&\frac{1}{2}mv^2(t)=\frac{1}{2m}p^2(t)\\\ \mathcal{U}&=&\frac{1}{2}kx^2(t)=\frac{1}{2}kq^2(t) \end{eqnarray} $ That is, the Hamiltonian with simple spring vibration looks like the following. $ \mathcal{H}=\frac{1}{2m}p^2(t)+\frac{1}{2}kq^2(t) $

The reason for introducing the amount such as Hamiltonian is that the symmetry of canonical transformation and canonical equation is beautiful, but there is a convenient relationship. It's frustrating, but I'll leave the details to others and believe that there is the following relationship here. $ \begin{eqnarray} \frac{dq(t)}{dt}&=&\frac{\partial \mathcal{H}}{\partial p}\\\ \frac{dp(t)}{dt}&=&-\frac{\partial \mathcal{H}}{\partial q} \end{eqnarray} $

And as an example, let's apply Hamiltonian with simple spring vibration. $ \begin{eqnarray} \frac{dq(t)}{dt}&=&\frac{\partial \mathcal{H}}{\partial p}=\frac{p(t)}{m}\\\ \frac{dp(t)}{dt}&=&-\frac{\partial \mathcal{H}}{\partial q}=-kq(t) \end{eqnarray} $

Since it is a little difficult to understand as it is, the generalized coordinates and generalized momentum are returned to the original physical quantities. $ \begin{eqnarray} \frac{dx(t)}{dt}&=&\frac{mv(t)}{m}=v(t)\\\ m\frac{dv(t)}{dt}&=&-kx(t) \end{eqnarray} $

The first relation is expressed that the velocity is the time derivative of the position. The problem is the second relational expression, which is actually the original equation of motion of the spring simple vibration itself. Substituting that the velocity is actually the time derivative of the position (the first relation), it becomes clear that it matches the equation of motion of the spring simple vibration mentioned at the beginning of this article. $ m\frac{dv(t)}{dt}=m\frac{d^2x(t)}{dt^2}=-kx(t) $

In fact, Hamiltonian contains too much important information to describe the phenomenon, and analyzing the phenomenon from the Hamiltonian's point of view (to allow a little misunderstanding) is exactly the essence of Hamiltonian mechanics. Since we dealt with the single motion of the spring simple vibration here, it is difficult to convey the gratitude, but when it comes to complicated phenomena such as many-body motion and interconnection, the true character is demonstrated.

For Hamiltonian (and Lagrangian), it might be a good idea to read this area [^ 15] first. [^ 15]: See Mantis Analytical Mechanics in Space Lagrangian and Hamiltonian formats as appropriate. Especially, the comparison with the Lagrangian format is easy to understand. By the way, the story like high school physics before that is called Newtonian mechanics.

Hamilton Neural Network Here, we will introduce a different approach from the ODENet introduced earlier in the Neural Network that expresses the phenomenon. This method has also been adopted in NeuroIPS 2019 [^ 7], and you can try it immediately with this original paper [^ 16] and author implementation [^ 17].

This method is intuitively very simple and easy to understand. By using the canonical equation, which is the heart of Hamiltonian mechanics, as a loss function, we are learning a Neural Network that expresses Hamiltonian well from the inputs of generalized coordinates and generalized momentum.

The point here is that you are not necessarily learning the Hamiltonian itself, whether good or bad, but it is important that you are learning something similar to the Hamiltonian. Therefore, even if the specific Hamiltonian of the phenomenon is unknown, it can be expected that a Neural Network that can express the time evolution well can be constructed.

The general flow of learning for Hamilton Neural Network is as follows.

(1) Input data are generalized coordinates and generalized momentum discretized in the time direction.

(2) Neural Network shall be characterized by parameter $ \ theta $

(3) Output the scalar value $ \ mathcal {H} _ \ {\ theta } $ by forward propagation (note that the Hamiltonian request was not included at this output stage).

(4) Calculate the following values by backpropagation using automatic differentiation. $ \frac{\partial \mathcal{H}\_\theta}{\partial p},\\:\frac{\partial \mathcal{H}\_\theta}{\partial q} $

(5) Obtain the time derivative of generalized coordinates and generalized momentum using the information of the time one step ahead (this corresponds to teacher data). $ \frac{dq(t)}{dt},\\:\frac{dp(t)}{dt} $

(6) Based on the canonical equation, learn to minimize the following loss function. $ L_{HNN}=\left(\frac{\partial \mathcal{H}\_\theta}{\partial p}-\frac{dq(t)}{dt}\right)^2+\left(\frac{\partial \mathcal{H}\_\theta}{\partial q}+\frac{dp(t)}{dt}\right)^2 $

The following is a diagram taken from Figure 1. of the Hamiltonian Network [^ 16].

Compared with the simple Neural Network, which is said to be the Baseline, it can be seen that the proposed method can express the behavior of the original physical phenomenon well.

As an interesting application, as shown in Figure 4 below, is it possible to reproduce the phenomenon well even if only the image data of the physical phenomenon is input? Screen-Shot-2019-11-18-at-15.24.38.png

However, the problem is that while single movement can express something that is relatively equivalent to the measured value, it has not been able to learn very well for many-body movement with interaction.

As you can see from Figure B.3. Below, the behavior of the three-body exercise collapses from a relatively early stage. (It seems that they will collide with each other due to their attractive force ...) However, compared to Baseline, I feel that it is trying to maintain a circular orbit, so I think it is relatively good.

However, despite being a very intuitive and simple Neural Network, it is very interesting to be able to express a single movement in detail. I think there is still room for development, so research and development will progress.

For this paper, please refer to the explanations on the sites [^ 18] [^ 19] around here as they are easy to understand. [^ 18]: AI-SCHOLAR Hamiltonian Neural Networks that enables prediction of object motion that satisfies the law of conservation of energy, Always I am indebted. [^ 19]: Github yoheikikuta / paper-reading [2019] Hamiltonian Neural Networks, it's still easy to understand.

Summary

It was an article summarizing the contents of the stabs for those who stab like that. I thought I'd write a highly virtuous summary, but I'm just coming to Nanki Shirahama at the company's development training camp, and I have less than 10 minutes left until the meeting time to go to eat dinner, so if I remember again I'm going to! I'm hungry. .. ..

It seems that the god of our team has also returned from the public bath, so that's it for now.

Tomorrow is a very virtuous article by @kirikei, a great senior from the same team and university! !! !! We are waiting for you to sit upright! !! !! Will study! !! !!

[PYTHON] I don't want to admit it ... The dynamical representation of Neural Networks