This article is the documentation for Google's Magenta project. A translation of "Generating Sequences With Recurrent Neural Networks" (David Ha). This document is a review of A paper of the same name by Alex Graves. It is a deliverable distributed under Apache License, Version 2.0.
Google Brain has launched Magenta, a project to generate art and music through deep learning. One of Magenta's goals is to showcase the research of the project and publish review articles for several papers.
Recurrent neural network (RNN) is a neural network that is compatible with time series data, and it is also used as model to compose in Google's project Magenta. .. In this article, I will explain the basic idea of time series generation by RNN.
Generating Sequences With Recurrent Neural Networks, written by Alex Graves in 2013, became one of the major papers on time series generation by recurrent neural networks. I am. Here we are discussing modeling the probability distribution of time series data. With this approach, instead of accurately predicting what will happen next, we get an RNN that predicts the future probability distribution from all the information we know from the past.
Even humans are easier to predict what will happen and how much it is likely to happen in the future than to accurately predict the future. However, this is a difficult problem for machines, especially [Markov property](https://ja.wikipedia.org/wiki/%E3%83%9E%E3%83%AB%E3%82%B3%E3% It's difficult for time series that don't have 83% 95% E6% 80% A7). Such a prediction can be defined as calculating the probability distribution that will occur in the next step given a time series of the entire past.
P( Y[n+1]=y[n+1]\ |\ Y[n]=y[n], Y[n-1]=y[n-1], Y[n-2]=y[n-2], \ldots) \ \ \ \ (1)
A simple method like the N-gram model gives the previous N characters and predicts the next character. There, the ones before the t-N step are truncated and approximated. Then, when N becomes large, it will not be adjusted well.
In this paper, Graves explains how to use RNNs to approximate the probability distribution function (PDF) in Eq. (1). Because RNNs are recursive, they can memorize rich representations of past events. This paper proposes to use the LSTM cell for RNNs to record information from the distant past. With this change, the probability distribution function for the next value in the time series can be approximated to the current time series value as a hidden function of the RNN.
P( Y[n+1]=y[n+1]\ |\ Y[n]=y[n], H[n]=h[n]) \ \ \ \ (2)
Graves details how to train and fit RNNs on many time series datasets, including Shakespeare's work, all Wikipedia articles, and online handwriting databases. This training uses backpropagation through time (BPTT), which uses the cross-entropy error between the time series generated by the RNN and the actual data set. In addition, use Gradient clipping to prevent gradients and weights from diverging.
After training, something interesting happens. If the probability distribution generated by the RNN is close enough to the empirical probability distribution function of the actual data, then the RNN will generate a plausible time series, albeit imitation, by taking a sample from this distribution. This technique has become well known in the last few years. This is political irony (Obama-RNN, [@deepdrumpf](https: // It is also used to generate twitter.com/deepdrumpf)) and ASCII art.
[Conceptual diagram of sampling time series from RNN](https://camo.githubusercontent.com/0312d4a679bc4bc3ba12152b3cb71071e80980a1/687474703a2f2f626c6f672e6f746f726f2e6e65742f77702d636f6e74656e742f75706c6f6166733263732
Document generation has been the most widely used application of this technique in recent years. This data is readily available and the probability distribution is modeled in the softmax layer. On the other hand, less studied is the approach of generating actual "number" time series, including sound waveforms, handwriting, and vector paintings.
In this paper, we are experimenting with training RNNs in an online handwriting database. This data is the actual handwritten characters recorded from the tablet and is expressed in one-stroke, one-stroke vector format. [The example of the IAM handwriting dataset looks like this. ](Https://camo.githubusercontent.com/0d1c202e1330d9f2a23e4e1e28ffb92db7908549/687474703a2f2f626c6f672e6f746f726f2e6e65742f77702d636f6e74656e742f75706c6f6164732f73697465732f6164732f73697465732
These handwritten samples are recorded by the tablet as a representation of a collection of small vectors of coordinate movement. Each vector has a binary state separate from the vector, which indicates that the stroke is over (that is, the pen leaves the screen). The next vector after one stroke shows the coordinates to start a new stroke.
The training data looks like this by visualizing each vector with a random color.
[In addition, each stroke can be visualized in random colors as well, which looks more attractive than visualizing each small vector. ](Https://camo.githubusercontent.com/ccb49182edce110ed2183a9a315b6163dd1ceaf6/687474703a2f2f626c6f672e6f746f726f2e6e65742f77702d636f6e74656e742f75706c6f6164732f73697465732f322f3230332f73697465732f322f
It should be mentioned that the model trains for each vector, not for each stroke. (One stroke is a collection of vectors until the pen goes up)
What the RNN does is model the conditional probability distribution for the next coordinate movement (a combination of real numbers that indicate the magnitude of the movement) and the conditional probability distribution that ends the stroke (indicated by two values, which is S). That is.
P( X[n+1]=x_{n+1}, Y[n+1]=y[n+1], S[n+1]=s[n+1]\ |\ \\X[n]=x[n], Y[n]=y[n], S[n]=s[n], H[n]=h[n] ) \ \ \ \ (3)
In the method described in the paper, the conditional probability distributions of X and Y are approximated by a mixed Gaussian distribution that is the sum of many small Gaussian distributions, and S is approximated by a Bernoulli random number variable. The technique of using neural networks to generate mixture distribution parameters was originally developed by Bishop (https://www.researchgate.net/publication/40497979_Mixture_density_networks) for feedforward networks. Extend this approach to RNNs. At each step, the RNN converts (x [n], y [n], s [n], h [n]) to the parameters of the Gaussian mixture function. This will change over time as x and y are rewritten. [For example, imagine an RNN looking at some previous points (the gray points), and the RNN should predict the probability distribution for the location of the next point (in the pink range). .. ](Https://camo.githubusercontent.com/0c33f16ff35dcaefc97cdae674bcbffff2626665/687474703a2f2f626c6f672e6f746f726f2e6e65742f77702d636f6e74656e742f75706c6f6164732f73697465732f322f32303135e74697465732f322f32303135
The pen may continue with the current stroke, or finish it and move to the right to start writing new letters. RNNs model this uncertainty. After fitting the model to all IAM databases, you can [RNN sample fake handwriting as described above. ](Https://camo.githubusercontent.com/5d9bcc37d297f4cd4adc83721b3592e138cb259d/687474703a2f2f626c6f672e6f746f726f2e6e65742f77702d636f6e74656e742f75706c6f6164732f73697465732f322f636f
You can also investigate the probability distribution output by the RNN during the sampling process in detail. where in order to understand the idea of the RNN, in addition to a sample of RNN, the probability distribution of coordinate movement (red dots) and stroke is ending probability (gray The sample of (line density) is visualized.
This is very powerful. And in the future, we can explore any policy by extending the way we sample this time series. For example, RNN that have been made a little change is training for the Chinese kanji data Can generate Fictitious Chinese Kanji.
The rest of Graves' paper also describes several ways to do conditional sampling. Given a model that uses the information of a character you want to write and the characters before and after it, you will be able to understand the subtle differences in the connection of characters.
P( X[n+1]=x[n+1], Y[n+1]=y[n+1], S[n+1]=s[n+1]\ |\ X[n]=x[n], Y[n]=y[n],\\ S[n]=s[n], C[n+1]=c[n+1], C[n]=c[n], C[n-1]=c[n-1], H[n]=h[n] ) \ \ \ \ (4)
Like all models, the generated RNN model is not without its limitations. For example, generated RNN models will be difficult to train on more complex datasets such as vector plots of animals. This is because the nature of each image is complex and diverse. For example, when portraying an animal, the model needs to learn higher-order concepts such as eyes, ears, nose, body, feet, and tail. When people write and draw, they almost always have ideas about what they want to write in advance. One drawback of this model is that the randomness is concentrated only in the output layer, and it may not be possible to capture and generate such a high level concept.
A promising extension of this RNN technique is to convert an RNN to a Variational RNN (VRNN) to learn conditional probability distributions. By using this new method, you can use Latent Variables (https://en.wikipedia.org/wiki/Latent_variable) and thought vector (http: //) to manage content types and output styles. www.iamwire.com/2015/09/google-thought-vectors-inceptionism-artificial-intelligence-artificial-neural-networks-ai-dreams-122293/122293) can be embedded in the model. In Graves' paper, applying VRNN to the same handwriting experiment as before gave some reliable results. The handwritten sample generated by VRNN maintains the same handwriting style, unchanged from one style to another.
In conclusion, this paper introduces a methodology that allows RNNs to act as generative models and opens up interesting directions in the area where computers generate content.
Recommended Posts