There is a story that AI wrote sentences, and recently, a manga made by learning Tezuka Osamu's manga was made. Such a level is difficult, but I was able to automatically generate sentences while looking at the book, so I will summarize it. It will span multiple times, but I think I'll do it slowly.
When generating a sentence, the image looks like the following.
--Prepare the original data --Format data neatly --Decompose sentences --Generated using Markov chains
Roughly speaking, I think it looks like this. This time I will try to break down the text.
Morphological analysis is based on information such as the grammar of the target language and the part of words of words called dictionaries, from text data (sentences) in natural language without notes of grammatical information. , Morpheme (roughly speaking, the smallest unit that has meaning in the language), and the part of each morpheme is discriminated. Source: Free encyclopedia "Wikipedia" It seems that. Look at the code and results for now!
from janome.tokenizer import Tokenizer
t = Tokenizer()
t
I use this "Tokenizer".
text = 'Kongo Dace was built by Vickers in the UK as a super-dreadnought battleship to introduce construction technology! Expect it!'
tokens = t.tokenize(text)#Lexical analysis
len(tokens) #Number of words
Enter the text you want to look up and analyze it. (The content is coming)
for token in tokens:
print(token)
When displayed, it looks like this. It seems that proper nouns and characteristic endings do not work. It seems that such fluctuations in sentences need to be corrected. Finally, make a word list.
texts = t.tokenize(text, wakati=True)
words_list =[] #Make a word list
for text in texts:
words_list.append(t.tokenize(text, wakati=True))
words_list
You could easily decompose sentences by using "Tokenize". Of course, such a short sentence is not enough for sentence generation, so we actually need more words. I wish I could make interesting sentences.