[Let's play with Python] Aiming for automatic sentence generation ~ Perform morphological analysis ~

Introduction

There is a story that AI wrote sentences, and recently, a manga made by learning Tezuka Osamu's manga was made. Such a level is difficult, but I was able to automatically generate sentences while looking at the book, so I will summarize it. It will span multiple times, but I think I'll do it slowly.

Image to generate sentences

When generating a sentence, the image looks like the following.

--Prepare the original data --Format data neatly --Decompose sentences --Generated using Markov chains

Roughly speaking, I think it looks like this. This time I will try to break down the text.

Try morphological analysis

Morphological analysis is based on information such as the grammar of the target language and the part of words of words called dictionaries, from text data (sentences) in natural language without notes of grammatical information. , Morpheme (roughly speaking, the smallest unit that has meaning in the language), and the part of each morpheme is discriminated. Source: Free encyclopedia "Wikipedia" It seems that. Look at the code and results for now!

from janome.tokenizer import Tokenizer
t = Tokenizer()
t

I use this "Tokenizer".

text = 'Kongo Dace was built by Vickers in the UK as a super-dreadnought battleship to introduce construction technology! Expect it!'
tokens = t.tokenize(text)#Lexical analysis
len(tokens) #Number of words

Enter the text you want to look up and analyze it. (The content is coming)

for token in tokens:
    print(token)

When displayed, it looks like this. It seems that proper nouns and characteristic endings do not work. It seems that such fluctuations in sentences need to be corrected. 2020-02-09.png Finally, make a word list.

texts = t.tokenize(text, wakati=True)
words_list =[] #Make a word list
for text in texts:
    words_list.append(t.tokenize(text, wakati=True))
words_list

Chat

You could easily decompose sentences by using "Tokenize". Of course, such a short sentence is not enough for sentence generation, so we actually need more words. I wish I could make interesting sentences.

Recommended Posts

[Let's play with Python] Aiming for automatic sentence generation ~ Perform morphological analysis ~
[Let's play with Python] Aiming for automatic sentence generation ~ Completion of automatic sentence generation ~
[Let's play with Python] Aiming for automatic sentence generation ~ Read .txt and make it one sentence unit ~
[Python] Morphological analysis with MeCab
Japanese morphological analysis with Python
Text mining with Python ① Morphological analysis
Let's play with Excel with Python [Beginner]
Python: Simplified morphological analysis with regular expressions
From preparation for morphological analysis with python using polyglot to part-of-speech tagging
Python hand play (let's get started with AtCoder?)
Move THORLABS automatic stage with Python [for research]
Play with 2016-Python
[Python3] Automatic sentence generation using janome and markovify
Let's try analysis! Chapter 8: Analysis environment for Windows created with Python and Eclipse (PyDev)
Text mining with Python ① Morphological analysis (re: Linux version)
Data analysis for improving POG 1 ~ Web scraping with Python ~
Collecting information from Twitter with Python (morphological analysis with MeCab)
[Let's play with Python] Make a household account book
3. Natural language processing with Python 4-1. Analysis for words with KWIC
[For play] Let's make Yubaba a LINE Bot (Python)
[Piyopiyokai # 1] Let's play with Lambda: Creating a Python script
Data analysis with python 2
Voice analysis with python
Voice analysis with python
Data analysis with Python
I wrote the code for Japanese sentence generation with DeZero
Perform isocurrent analysis of open channels with Python and matplotlib
[Let's play with Python] Image processing to monochrome and dots
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
Automatic quiz generation with COTOHA
Let's play with 4D 4th
Let's play with Amedas data-Part 1
Python for Data Analysis Chapter 4
Python: Japanese text: Morphological analysis
Let's run Excel with Python
Sentiment analysis with Python (word2vec)
Sentence generation with GRU (keras)
Let's play with Amedas data-Part 4
[Python] Play with Discord's Webhook.
Planar skeleton analysis with Python
Let's write python with cinema4d.
Play RocketChat with API / Python
Python for Data Analysis Chapter 2
Let's play with Amedas data-Part 3
Let's play with Amedas data-Part 2
Let's build git-cat with Python
Muscle jerk analysis with Python
[PowerShell] Morphological analysis with SudachiPy
Python for Data Analysis Chapter 3
Mechanism for automatic lint check with flake8 when committing python code
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Create a USB boot Ubuntu with a Python environment for data analysis