Differentiable neural machine?

The paper "Hybrid computing using a neural network with dynamic external memory" submitted by Google DeepMind to Nature is somehow. It has a dangerous scent. In the official introductory article "Differentiable neural computers", the story begins with Platon's memory theory, and in the dissertation, Kaiba, who controls the memory of the brain. It's quite magnificent because it's likened to. We were able to get one step closer to the ideal artificial intelligence from the perspective of philosophy and neuroscience, not just improving the performance of neural networks. I feel that this is an invention of a new way of computer.

The mechanism includes the popular concept of Attention, which includes a matrix that represents memory and a controller that inputs and outputs vectors while selectively manipulating it. The controller is an RNN (Recurrent Neural Network), and memory operations are also differentiable. Therefore, I think that it is possible to machine-learn general algorithms with data structures, just like Turing machines. maybe. In the experiment in the paper, it seems that the route search of the graph could be learned with high accuracy. What is that amazing.

I was shocked, but I haven't read the formulas in the paper yet, but immediately "Overview of DNC (Differentiable Neural Computers) + Implementation by Chainer" There are people who have implemented it, so I will play with it. In the original code, we learned echo, so let's develop sort learning.

result

Then, here is the result of testing after training 20,000 samples. Yes, all the questions are correct.

# 20000

in  [7, 4] ASC
ans [4, 7]
out [4, 7] 1.0

in  [1, 6, 3] ASC
ans [1, 3, 6]
out [1, 3, 6] 1.0

in  [2, 6, 1, 2] ASC
ans [1, 2, 2, 6]
out [1, 2, 2, 6] 1.0

in  [1, 0, 6, 1, 7] DESC
ans [7, 6, 1, 1, 0]
out [7, 6, 1, 1, 0] 1.0

in  [5, 1, 2, 1, 4, 6] ASC
ans [1, 1, 2, 4, 5, 6]
out [1, 1, 2, 4, 5, 6] 1.0

in  [5, 4, 3, 5, 5, 8, 5] DESC
ans [8, 5, 5, 5, 5, 4, 3]
out [8, 5, 5, 5, 5, 4, 3] 1.0

in  [9, 3, 6, 9, 5, 2, 9, 5] ASC
ans [2, 3, 5, 5, 6, 9, 9, 9]
out [2, 3, 5, 5, 6, 9, 9, 9] 1.0

in  [0, 6, 5, 8, 4, 6, 0, 8, 0] ASC
ans [0, 0, 0, 4, 5, 6, 6, 8, 8]
out [0, 0, 0, 4, 5, 6, 6, 8, 8] 1.0

"In" is the input column and the sorting order. "ASC" is in ascending order and "DESC" is in descending order. "Ans" is the correct answer, "out" is the output sequence, and the match rate with the correct answer. From this result, we can see that DNC was able to learn sorts, and also sorts with order flags and variable column lengths.

And although the training data is given only those with column lengths from 2 to 8, in the test, columns with length 9 can also be sorted, so it is also generalized for column length. It looks like you're successful. This was a surprise.

Parameters & Model

I tried to set the DNC setting parameters like this.

Parameters	value
Input vector size	12
Output vector size	10
Number of memories	20
One memory size	12
Number of read headers	2

The RNN of the controller remains the original code, one layer is LSTM Layer (in: 36 out: 83) and the second layer is Fully Connected Layer (in: 83 out: 83), so it is not deep. I haven't tuned anything around here, so I think we can improve it in various ways. Rather, I was scared because I could easily learn in one shot without tuning anything.

Encode & decode

in  [1, 6, 3] ASC
ans [1, 3, 6]
out [1, 3, 6] 1.0

This is actually input / output to DNC in the following sequence. First, put the order flag, and after putting in the sequence, the sort result will be spit out.

Execution step	1	2	3	4	5	6	7
input	ASC	1	6	3	-	-	-
output	x	x	x	x	1	3	6

To be precise, the input is one-hot encoded. "-" Is a zero vector. The output adopts the index of the largest vector element. Ignore the "x". This area is also the original code.

As a DNC, the length of the sequence to be sorted is known only after the zero vector arrives, and at that step, the minimum or maximum number must be output. Before learning, I expected that I would have to give a grace step from input to output, but there was no particular problem. Smart.

poem

The development and application of the neutral network in recent years has been remarkable, but it is still a monster of pattern recognition in specific areas such as images and natural language, and it is still far from true artificial intelligence. Eh, could you generate a 2D illustration with DCGAN? (Let the computer draw an illustration using Chainer), Did LSTM generate Shakespeare-like sentences? (The Unreasonable Effectiveness of Recurrent Neural Networks) Well, if you look at it from a distance, you might be fooled. Wow wow.

I was arguing that it was high, but I was delusional that I was about to begin to acquire intelligence. If we could really do general-purpose machine learning of data structures and algorithms with a DNC-like approach, there was a vague fear that the programmer's job would be taken away, and that people have their own intelligence. I'm afraid that what I believe will be automated, or I can't laugh at the craftsmen who destroyed machines in the Industrial Revolution. .. .. Well, practically, it is obviously over-engineering to bring DNC to something like this sort, and as the free lunch theorem teaches, the task of making full use of knowledge such as mathematical optimization is a person. I think it's best to program. Depending on the genre of the task, the era of "do not program and train" may come.

code

yos1up / DNC / main.py | GitHub

The file execution part of is replaced with the following. It's a code that imitates Chainer's appearance, so if there is something wrong, please point it out m (_ _) m

len_number = 10
X = len_number + 2
Y = len_number
N = 20
W = X
R = 2
mdl = DNC(X, Y, N, W, R)
opt = optimizers.Adam()
opt.setup(mdl)


def run(low, high, train=False):
    order = np.random.randint(0, 2)
    content = [np.random.randint(0, len_number) for _ in range(np.random.randint(low, high))]
    sorted_content = sorted(content, reverse=(order != 0))
    x_seq_list = map(lambda i: onehot(i, X), [len_number + order] + content) + [np.zeros(X).astype(np.float32)] * len(content)
    t_seq_list = ([None] * (1 + len(content))) + map(lambda i: onehot(i, Y), sorted_content)

    result = []
    loss = 0.0

    for (x_seq, t_seq) in zip(x_seq_list, t_seq_list):
        y = mdl(Variable(x_seq.reshape(1, X)))
        if t_seq is not None:
            t = Variable(t_seq.reshape(1, Y))
            if train:
                loss += (y - t) ** 2
            else:
                result.append(np.argmax(y.data))

    mdl.reset_state()
    if train:
        mdl.cleargrads()
        loss.grad = np.ones(loss.data.shape, dtype=np.float32)
        loss.backward()
        loss.unchain_backward()
        opt.update()
    else:
        print 'in ', content, 'ASC' if order == 0 else 'DESC'
        print 'ans', sorted_content
        print 'out', result, sum(1.0 if s == r else 0.0 for (s, r) in zip(sorted_content, result)) / len(result)


for i in range(100000):
    run(2, 9, train=True)
    if i % 100 == 0:
        print
        print '#', i
        for j in range(2, 10):
            print
            run(j, j + 1)

Execution environment

$ python --version
Python 2.7.11

$ pip freeze
chainer==1.17.0
filelock==2.0.7
nose==1.3.7
numpy==1.11.2
protobuf==3.1.0.post1
six==1.10.0

[PYTHON] I was able to machine learn the sorting algorithm (with variable length & order flag) with RNN that manipulates memory!