[PYTHON] Verify "Fucking Deca" + "Rashomon" with word2vec

Preface

After reading Fucking Deca Rashomon, I found a certain law in the text. Each word of the original Rashomon is a fucking deca.

In other words, with word2vec "Damn" + "Rashomon" = "Fucking Deca Rashomon" Can be reproduced.

I also learned how to use gensim and sudapipy.

Verification

Build an environment in python.

pip install sudachipy         #Morphological analyzer sudachi
pip install sudachidict_core  #sudachi dictionary
pip install gensim            #Library for running word2vec

Then I downloaded the Japanese word vector chiVe. (v.1.1 mc90)

code

Check if morphological analysis works in sudachi.

python


from sudachipy import tokenizer
from sudachipy import dictionary

tokenizer_obj = dictionary.Dictionary().create()
mode = tokenizer.Tokenizer.SplitMode.A

[m.surface() for m in tokenizer_obj.tokenize("Of the thighs and thighs", mode)]
# ['Plum', 'Also', 'AlsoAlso', 'AlsoAlso', 'of', 'home']

Next, check if word2vec works with gensim.

python


import gensim.models
from gensim.test.utils import datapath

file_path = "./chive-1.1-mc90-20200318/chive-1.1-mc90-20200318.txt"
wv = gensim.models.KeyedVectors.load_word2vec_format(datapath(file_path), binary=False)

for i in wv.most_similar(positive=['Dodekai']):
    print(i)

# ('Big', 0.7684822082519531)
# ('huge', 0.677775502204895)
# ('Stupid', 0.5706542730331421)
# ('Dokan', 0.5430377125740051)
# ('Huge', 0.5240563154220581)
# ('Dodon', 0.5237661600112915)
# ('Oversized', 0.5200765132904053)
# ('Dokan', 0.5147513151168823)
# ('big', 0.5112403631210327)
# ('Deke', 0.4992992877960205)

Rashomon The text is fetched from Aozora Bunko. I copied the HTML source and processed it as follows to remove ruby.

python


import re

text = """It's the way of life one day. one person's<ruby><rb>Commoner</rb><rp>(</rp><rt>Genin</rt><rp>)</rp></ruby>But,...
...
"""

plane_text = re.sub('<ruby>.*</ruby>|<br />|\n|\u3000', '', text)
wakati_text = [[m.surface(), m.part_of_speech()] for m in tokenizer_obj.tokenize(plane_text, mode)]
wakati_text

# [['is there', ['verb', 'Non-independent', '*', '*', 'Five steps-La line', 'End-form-General']],
#  ['Day', ['noun', '普通noun', 'Adverbs possible', '*', '*', '*']],
#  ['of', ['Particle', '格Particle', '*', '*', '*', '*']],
#  ['How to live', ['noun', '普通noun', 'General', '*', '*', '*']],
#  ['of', ['Particle', '格Particle', '*', '*', '*', '*']],
# ...

It's finally the production. It's time to try "Fucking Deca" + "Rashomon".

python


kusodeka_text = []
for word in wakati_text:
    if word[1][0] in ['Particle', 'Auxiliary symbol', 'conjunction']:
        kusodeka_text.append(word[0])
    else:
        try:
            kusodeka_text.append(wv.most_similar(positive=['Dodekai', word[0]])[0][0])
        except:
            kusodeka_text.append(word[0])  #If it is not in the word dictionary

result

It's a big one at the end of a month.
I was waiting for the rain in a big throat.
There is no such thing as a big throat in a big throat.
However, it's huge, but it's huge, it's huge, it's huge, it's huge.
It's huge, but this big one isn't too big.
When it comes to big things, this big big earthquake lives in Osaka.
In the end, it's huge, even if it's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big.
Then, when the big one disappeared, he wanted to make the big one worse, and he was struck by the big one with his feet.
Instead of being huge, I was looking at the huge falling throat while I was huge.
The big one was big, "The big one was waiting for the rain."
However, even if it rains big, it should be big, but it should be big.
It's usually big, but it's big, big, big, and big.
Because it's huge, it's huge, and it's huge in April and May.
Writing in a big way Big big big big, big Osaka's big big is trying to make a big life-I'm trying to make a big big one While following the huge and big throats, I heard the big and big sounds of the big and big and big falling.
It's big, and it's big, so it's big, and it's big.
The big one is big, the big one is big, the big one is big, the big one is big, the big one is big, the big one is big, and the big one is big, and the big one is big, and the big one is big.
It's just a big denial of the big thing, saying, "It's big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big." , There was no big deal.
The big one went to somewhere big and big.
The big throat was big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
Because it's huge, it's in front of the big one.
Big Throat Big Throat Big Big Throat On the top of the big big throat, a big big throat squeezes a big throat and kills the big throat It's huge, and it's reflected while rolling, so it's immediately known to be huge.
In the middle of this rainy night, it's a big, big, big, big, big, big, big, big, big, big, big, big.
The big one is huge.
For big and big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
It's huge.
However, the big one is big, the big one is big, the big one is big, and the big one is big.
It's a big one, but almost everything has taken away the sense of smell of this big one.
The big reflex was big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
If the hair is huge, it will be a huge man.
Big-ass, the big-ass big-ass of big-ass and big-ass big-ass of curiosity being moved big-ass and big-ass, the big-ass big-ass big-ass, big big-ass big-ass a big-ass big hair by big-ass.
It seems that the hair comes off according to the huge size.
As the big hair comes off one by one, the big one disappears one by one from the big one.
It's huge, it's huge, it's huge, it's big, it's big, it's big, it's big, it's big, it's big.
――It was huge, and it was burning down like this.
For big, big, big, big hair, big, big, big, big, big.
Therefore, in a rational big way, I don't know if it's okay to dispose of big things into big things.
But for the big guys, this big, big, big hair, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big hair.
However, the big one is huge, but the big one is big and big, and the big one is big and big.
So, the big one has a big one on one leg, and suddenly it goes from big to big.
Big big sword big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big big
Needless to say, I was surprised at how big it was.
Big is big, big, big, big, big, big, big, big, big, big, big, big, big, big.
"Where are you coming?
The big one is the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one, the big one.
"I was huge.
Say.
This is also big, so to speak. "
The big one is big, and the big one is suddenly left behind in the big sword, but the big one is just a big one, and the big one is big, the big one is big, the big one is big. It's huge, but it's huge.
So, the big one is a big one, softening the big one while making a big one.
"It must be huge and big, trying to make it big.
However, it's big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big. "
Big and big, big open-big big reflex, big big, big big big big big guard big.
A huge pant voice, a huge and huge voice transmitted to a huge and huge.
"It's big to pull out this hair, it's big to pull out this hair, and so on."
The big question is big, and the big one is disappointing.
Big but discouraged, big and big, and big hatred, cold and big, muttering, big and big, mumbling, big and big.
"It's huge, I heard he bought it.
The big one is not the big one of this man.
It's huge, it's big, it's big, it's big, it's big, it's big, it's big, it's big.
By the way, it's huge, it's huge, it's huge, it's huge, it's huge, it's huge.
This is also very big, it's big, it's big, it's big, it's big, it's big, it's big.
Well, a big, big, big, big man, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big. "
Big is big, big, big, big, big.
The big sword is big, but you need to listen to it.
However, it is necessary to hear this, and for the big and big, there is a big and big and big.
The big one is big and big, and this big one is big and big.
Big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big. is there.
The big one shouldn't have just wondered whether it's big or big.
Most of the big, big-hearted, big-hearted, big-hearted ones were kicked out by the big-hearted, big-hearted, big-hearted ones.
"Big, big."
It's big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
"Then, I have a big grudge to try.
It's big and big, but it's also big and big. "
The big one is big, quickly stripping off the big kimono.
It's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big, it's big.
It's a big step, a big step to a big throat.
The big one is a big one with a big kimono stripped aside, and another big one, a big one, a big one, and a big one in the middle of the night.
For a while, it's dead, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
The big one is muttering big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big, big.
It's huge, and from there there's a huge, huge night.
I don't even know how big it is.
。

Recommended Posts

Verify "Fucking Deca" + "Rashomon" with word2vec
Word2Vec with BoUoW
Implemented word2vec with Theano + Keras
Sentiment analysis with Python (word2vec)
I made Word2Vec with Pytorch