[PYTHON] [Evangelion] Try to automatically generate Asuka-like lines with Deep Learning

Introduction

Congratulations on the 20th anniversary of Evangelion: confetti_ball: In addition, Happy Birthday to Asuka. (4 days late)

I tried to automatically generate sentences, which is also used in Twitter Bot etc., using a recurrent neural network (hereinafter: RNN) which is a kind of popular deep learning.

Data collection

Even if there is nothing, it will not start without data. I was prepared to transcribe it, but there was a thankful site that summarizes all the lines of animation. Thanks.

I extracted all the lines from here. The format of the dialogue is like this, and the character name is "Serif".

Broadcast "Today, 12:In the 30th minute, a special state of emergency was issued throughout the central Kanto region, centered on the Tokai region. Residents should evacuate to the designated shelter immediately. "
Broadcast "I will tell you repeatedly ..."
Misato "I didn't want to lose sight of it at this time."

Telephone "All regular lines are currently out of service due to the issuance of a special state of emergency."
Shinji "No, I didn't come ... I can't meet ... I can't help it, let's go to the shelter."

Operator "Unidentified moving objects are still in progress for the headquarters"
Shigeru "Check the target on the video. Turn it to the main monitor."
Fuyutsuki "It's been 15 years since then"
Gendou "Oh, no doubt, an apostle."

・ ・ ・

Extraction of lines for each character

This data is divided into lines for each character, and the original data for automatically generating the lines is created.

I wrote it in python.

# -*- coding: utf-8 -*-

import sys
import os
import chardet

from os.path import join, splitext, split

#Specify input directory and output directory
readdir = sys.argv[1]
outdir = sys.argv[2]

print "readdir:\t", readdir
print "outdir:\t", outdir

#Get a list of files in a directory
txts = os.listdir(readdir)

for txt in txts:
    if not (txt.split(".")[-1] == "txt"):   #Ignore extensions other than txt
        continue
    txt = os.path.join(readdir, txt)
    print txt

    fp = open(txt, "rb")

    #Get the character code of the file
    f_encode = chardet.detect(fp.read())["encoding"]

    fp = open(txt, "rb")
    lines = fp.readlines()

    for line in lines:
        #Convert to unicode
        line_u = unicode(line, f_encode)

        #Get the character name
        char_name = line_u[:line_u.find(u"「")]
        outfname = os.path.join(outdir, char_name + ".txt")

        #Check if there is a file with the character name
        if os.path.exists(outfname):
            #If there is, in overwrite mode
            outfp = open(outfname, "a")
        else:
            #If not, create a new one
            outfp = open(outfname, "w")

        #Extract only lines
        line_format = line_u[line_u.find(u"「") + 1:line_u.find(u"」")] + "\n"
        #Write lines
        outfp.write(line_format.encode("utf-8"))

Read line by line from the text file, divide it into lines between the character name before " and " `", and if the character name file already exists, open the overwrite mode file and add it. If not, a new one is created.

The generated file looks like this. I thought that Cessna was a character, but when I looked at the contents, it was the radio line of the person who was riding Cessna.

・
・
Aska.txt
Kaworu.txt
keel.txt
class.txt
Cessna.txt
Naoko.txt
・
・

The contents are like this.

Aska.txt


Haro-o, Misato!I was fine?
That's right. I'm getting more feminine in other places as well.
It's a viewing fee. It's cheap, isn't it?
What are you doing!
And which is the rumored third child?No way, now ...
Hoon, it's dull.
・
・

Only the lines are taken properly.

Now that we have the data for learning, we will start creating a program for learning the main subject.

Automatic generation of lines

There are several language models and sentence generation methods, but this time we will use a recurrent neural network (RNN). As a comparison method, we will also implement a method using Markov chains.

What is RNN?

RNN is a general term for neural networks that have a cycle inside.

For example, as shown in this figure, the contents of the middle layer at time t are treated as input at the next time t + 1. This structure allows RNNs to temporarily store information and pass it on to the next input. This makes it possible to capture and process the "flow of time" that exists in the data. In this program, instead of the RNN node called Long short term memory (LSTM), Block that can hold the input value is adopted.

I'm sorry that it is difficult to understand if you explain it firmly because it will be very long, but for a detailed overview and easy-to-understand explanation, please see other people's materials.

-Predict time series data with neural network -[Outline of Recurrent Neural Network and Operating Principle (PDF)](http://wbawakate.jp/wp-content/uploads/2015/03/RNN%E3%83%95%E3%82%9A%E3%83 % AC% E3% 82% BB% E3% 82% 99% E3% 83% B3.pdf)

What is a Markov chain?

A Markov chain (Markov chain) is a Markov process, which is a type of stochastic process, in which the possible states are discrete (finite or countable) (discrete state Markov process). In particular, it often refers to discrete times (time is represented by subscripts) (there is also a continuous time Markov process, which is continuous in time). In the Markov chain, future behavior is determined only by the current value and is irrelevant to past behavior (Markov property). From Wikipedia

We will make this 3-gram and perform Markov chaining. 3-gram is a set of three words (characters) cut out from a certain character string.

For example

I wonder why boys are stupid and lewd!

If there is a sentence that says, this time, we will cut out each word using Mecab, so we will make 3 chunks by shifting the word one by one.

word word word
(BOS) why boy
why boy What
boy What
What Ah
Ah Stupid
Ah Stupid so
Stupid so Lewd
so Lewd Nana
Lewd Nana of
Nana of I wonder
of I wonder
I wonder (EOS)

BOS: Abbreviation for Begin Of Sentence EOS: Abbreviation for End Of Sentence

For a detailed explanation of Markov chains, please refer to other people's easy-to-understand materials.

-Introduction to Markov Chain Monte Carlo Method-1 -I tried to chain Markov -Automatic generation of sentences by Markov chain

Implementation

RNN The library that supports RNN

And so on. Google's Tensorflow is popular, but I dare to use Chainer. (Made in Japan)

English sentence generation program using chainer yusuketomoto/chainer-char-rnn · GitHub It was created by modifying it based on.

I will briefly explain only the core part of the program.

CharRNN.In py


embed = F.EmbedID(n_vocab, n_units),
l1_x = F.Linear(n_units, 4*n_units),
l1_h = F.Linear(n_units, 4*n_units),
l2_h = F.Linear(n_units, 4*n_units),
l2_x = F.Linear(n_units, 4*n_units),
l3   = F.Linear(n_units, n_vocab),

In this part, the model is set. n_vocab is the number of word types in the string n_units is the number of units, and this time it is set to 128 for execution.

CharRNN.In py


    def forward_one_step(self, x_data, y_data, state, train=True, dropout_ratio=0.5):
        x = Variable(x_data.astype(np.int32), volatile=not train)
        t = Variable(y_data.astype(np.int32), volatile=not train)

        h0      = self.embed(x)
        h1_in   = self.l1_x(F.dropout(h0, ratio=dropout_ratio, train=train)) + self.l1_h(state['h1'])
        c1, h1  = F.lstm(state['c1'], h1_in)
        h2_in   = self.l2_x(F.dropout(h1, ratio=dropout_ratio, train=train)) + self.l2_h(state['h2'])
        c2, h2  = F.lstm(state['c2'], h2_in)
        y       = self.l3(F.dropout(h2, ratio=dropout_ratio, train=train))
        state   = {'c1': c1, 'h1': h1, 'c2': c2, 'h2': h2}

        return state, F.softmax_cross_entropy(y, t)

This is the part related to one step during learning. Give batch size to x_data and y_data, The hidden layer is LSTM, and the output uses the softmax cross entropy function.

train.In py


def load_data(args):
    vocab = {}
    print ('%s/input.txt'% args.data_dir)
    f_words = open('%s/input.txt' % args.data_dir, 'r')
    mt = MeCab.Tagger('-Ochasen')

    words = []
    for line in f_words:
        result = mt.parseToNode(line)
        while result:
            words.append(unicode(result.surface, 'utf-8'))
            result = result.next
    dataset = np.ndarray((len(words),), dtype=np.int32)

    for i, word in enumerate(words):
        if word not in vocab:
            vocab[word] = len(vocab)
        dataset[i] = vocab[word]
    print 'corpus length:', len(words)
    print 'vocab size:', len(vocab)
    return dataset, words, vocab

In this part, when inputting and giving data, morphological analysis is performed with MeCab and it is given separately for each word.

There is also a sample code explanation of the recurrent neural network of the other chainer, so please refer to that as well. I tried to explain the sample code for creating a recurrent neural language model using Chainer

Markov chain

Created by another person I made an automatic sentence generation program for Python rehabilitation o-tomox/TextGenerator · GitHub I used it as it is. Thank you very much.

The flow of sentence generation is like this.

  1. First, start sentence generation from the set starting with (BOS).
  2. Find another pair that begins with the third word in the (BOS) pair. If more than one is found, randomly select one from them.
  3. Repeat this until the third word is (EOS).
  4. Completion of sentences by combining the sets of words selected so far.

result

I tried to generate 20 sentences at a time automatically.

RNN

You!It's here. I like it.
No!
I'm dead on the contrary, already!Not only now!
So, close me!
I hate it
Stupid, wait.
Truthful, stupid, no!
Muu!
That's stupid!That's it.
What's wrong!
How about being in the above.
Hoon, it's dull.
Nasty!Anti-vomiting, stupid, wake-up machine is just a little person.
Hey ah ah ah!
That's a little.?I'm going to develop!
Let's go to Unit 2 each other.
Of course it doesn't work.
I won't kill you. I am. No one needs to hate it.
What is the commander, what is decided for me is to beat Eva to the stuffed animal!

** What is the commander, what you decide is to beat Eva to the stuffed animal! ** Asuka Kansai people theory

Markov chain

Let's go!
Unwilling to go?
I, Asuka, let's go, just do it comfortably without making the right arm clear?
Show me a little ... I can't solve this kind of formula!
Stop it anymore ... I wonder if the tamed man will snow tomorrow.!?
Ah, come early!
Thermal expansion?I don't need childish things anymore!
Stop saying this, mom!
I was chosen. Leave it in the cage.
I've been waiting!I've dropped the knife, I'm not relieved!
It's a sight-seeing fee to talk to you.
There is no other person.
I live alone like that!
There is no other person?
All right, I haven't heard it yet.
What did I do?
Yeah ... I can't help it anymore ...
Don't partition yourself, I'll go at full capacity from the beginning and at maximum speed.
I wonder what they are doing.
Hey, don't look ahead.

Some sentences appear as they are. Grammatically, there are many correct sentences here.

Finally

This time, the Markov chain gave better grammatical results. I haven't adjusted any parameters, so I haven't fully utilized the RNN. If there is a better way to generate sentences with RNN, I would like to ask you to teach.

I think there is still potential for automatic generation, so I'd like to expect it in the future.

bonus

Shinji and Rei also tried it.

Shinji

RNN

Asuka!
But everyone hates me.
Thank you again today and I'm scared to throw it away!I'm ... I'm not.
I ... there are people I can be proud of, my dad was staring at me on the ice. But I wonder. Will not stop!
Yup. Yeah, my smell.
Yes, I wonder if I'm already on Eva ... from ... san. I don't need me in the world. I ride, but I'm from me!
I didn't run away before I was learning. I think I can do it!
that's right. I'm scared of my goal.
Yes, I'm fine with me.
I'm me, I'm mine?What is the world!
Rain, I don't know what to remember. Good!
Would you like to ride the waves?
Hmmm, I wonder if that world is really ... Can't you understand me with Katsuragi wave??

"I" collapses Gestalt

Markov chain

Who?I couldn't forgive you.
I'm not worth it I think it's good?
I think I know.
So Misato!
It's a lonely city ...?
Why do I have to make a fuss?
But I couldn't help it ... I want value, but all of them are real Shinji Ikari, Rei Ayanami!
to come. Quit!
I think I ran away!
Um ... how can everyone be happy with that ... dad?!
Ayanami. I'm doing my best.
The bad thing is that weapons that are no longer on Eva are useful.
It's pitch black, I don't want to eat supper ... I don't.
What are you talking about!
I try to understand, so I often remember difficult things. Otherwise, our enemies.
Funuuuuuuuu!

Ray

RNN

why?
You are together Blood container.
It's making?
You are, and you are lonely in your heart!
Your world is alive.
After all, I'm late!I'm late on the first day
Yes
You don't lie.
that's right. With ruin, freedom disappears.
I can't see myself without other people.
What do you hate, from disappearing. I am with everyone.
You are Rei Ayanami, your mother.
I really feel with you?
Operation Yashima says you've come to peep
So, Rei Ayanami's?

** After all, I'm late! I'm late on the first day ** It is output from the lines of that scene, but when it comes in this line, it feels strange. (Although the lines may be different)

Markov chain

... emergency call ... I'll go ahead.
It can't be helped.
It feels so good.
30(Ichinana Sanmaru), Gathering at Cage Rainy days are depressing.
But no, I want to return to nothing.
The rest is something I will protect, so everyone thinks so too.
I'm glad I'm sleeping?
for whom?
I have nothing.
Very strange.
Ikari commander now.
No, it feels the same as me.
Hmm ah ah!
However, there is nothing in you, you can't see it.
You can go home alone, so don't come in that style

Recommended Posts

[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
Try to automatically generate Python documents with Sphinx
Try deep learning with TensorFlow
Try Deep Learning with FPGA
Generate Pokemon with Deep Learning
Try Deep Learning with FPGA-Select Cucumbers
Try to build a deep learning / neural network with scratch
Try deep learning with TensorFlow Part 2
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Try Bitcoin Price Forecasting with Deep Learning
Try with Chainer Deep Q Learning --Launch
Try deep learning of genomics with Kipoi
Try to generate an image with aliasing
PPLM: A simple deep learning technique to generate sentences with specified attributes
Try to predict forex (FX) with non-deep machine learning
I tried to automatically generate a password with Python3
Try to factorial with recursion
Deep Kernel Learning with Pyro
Introduction to Deep Learning ~ Learning Rules ~
Try to predict if tweets will burn with machine learning
I captured the Touhou Project with Deep Learning ... I wanted to.
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Try machine learning with Kaggle
Introduction to Deep Learning ~ Backpropagation ~
I tried to divide with a deep learning language model
I tried to make deep learning scalable with Spark × Keras × Docker
Cat breed identification with deep learning
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Make ASCII art with deep learning
Try to operate Facebook with Python
Introduction to Deep Learning ~ Function Approximation ~
Try to profile with ONNX Runtime
Deep learning to start without GPU
Automatically generate model relationships with Django
Solve three-dimensional PDEs with deep learning.
Try machine learning with scikit-learn SVM
Introduction to Deep Learning ~ Coding Preparation ~
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Try Common Representation Learning with chainer
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Make people smile with Deep Learning
Introduction to Deep Learning ~ CNN Experiment ~
Try to output audio with M5STACK
I tried to implement deep learning that is not deep with only NumPy
Coursera's TensorFlow introductory course to get you started with Deep Learning implementations
Automatically generate Object specifications with Blue Prism
Beginners automatically generate documents with Pytorch's LSTM
Try to reproduce color film with Python
Try logging in to qiita with Python
Reinforcement learning to learn from zero to deep
Classify anime faces with deep learning with Chainer
Introduction to Deep Learning ~ Convolution and Pooling ~
Deep learning / Deep learning from scratch 2-Try moving GRU
How to study deep learning G test
Try to predict cherry blossoms with xgboost
Try converting to tidy data with pandas
Sentiment analysis of tweets with deep learning
Quickly try to visualize datasets with pandas