[PYTHON] I tried to linguistically analyze Karen Takizawa's incomprehensible sentences.

It seems that addicts are appearing one after another in the text of Karen Takizawa's Instagram. Karen Takizawa's incomprehensible Instagram wwwwwwwwwwww

ss.jpg Source: https://www.instagram.com/takizawakarenofficial/

What kind of sentence structure is it?

When I actually read Karen Takizawa's text, I could read it in Japanese, but the meaning of the text didn't come to my mind. At first, I wondered if it was the result of learning the corpus by deep learning (RNN system or something) and generating sentences.

I think there are various factors that make me feel this way, but in order to find out how incomprehensible the sentence is in terms of language processing, I parsed Karen Takizawa's sentence and visualized the result.

Method

-I tried parsing using CaboCha. ――The punctuation mark "." Is rarely used in Karen Takizawa's text, and pictograms are used instead of the punctuation marks. Therefore, the sentence was divided into sentence units with the sequence of pictograms as the breaks in the sentence. --The graph structure of the parsing result was visualized using pydotplus.

Case 1

Input statement

*** On the day after my birthday, while remembering a year of 24 years old, I was a little confused by the baton touch to 25 years old, but I couldn't even open my eyes when I arrived at Nosonoso. It was this leaflet that came into my eyes ***

Analysis result

graph0.jpg

Looking at this example, it can be seen that although the syntax structure is complicated, the dependency itself can be analyzed almost correctly and has the correct syntax structure for Japanese. However, since one sentence is long and the dependencies are often in a parallel relationship, it seems that the sentence is difficult for humans to understand.

If you follow each node on the graph in the order of appearance and read the text, you will feel that the movement distance in the graph is large. In other words, when following the meaning of a sentence sequentially in the head, the movement distance on the syntax graph is large, it consumes a lot of working memory in the brain, and it is difficult to understand (maybe ... ・ ・ I'm not sure).

Case 2

Input statement

*** When I turn around, I realize that there are many allies and many kind people, and I don't bother to realize that my birthday isn't coming, but even though I'm inspiring myself, every day I get caught like a lake. It was a dazzling 25-year-old morning when I had a longing for someone who wasn't, calm, and didn't feel any sound ***

Analysis result

graph0.jpg

It's quite funky complexity. I don't think humans can develop such a syntax tree in the brain, and it cannot be read and understood sequentially. Since the sentence is long and the structure is complicated, it seems that a parsing error (?) Has also occurred. Isn't it difficult to understand, which is comparable to the text of a patent?

Analysis example of news article

For comparison, I will post the result of parsing in ordinary sentences.

Input statement

*** Of the 47 prefectures and 20 ordinance-designated cities nationwide, 70% of local governments do not have standards for disclosing the names of missing persons in the event of a disaster. The Asahi Shimbun surveyed each local government in line with the enforcement of the revised Personal Information Protection Law on the 30th. There were also 5 local governments who answered that their names would not be announced *** Source: https://www.asahi.com/

Analysis result

graph0.jpg

As mentioned above, in the case of ordinary (?) Sentences that are edited to be easy to read, such as news articles, the sentences are reasonably long, but the syntactic structure is somewhat low in parallelism, so it is not difficult to read.

Summary

――When I analyzed Karen Takizawa's Instagram sentences by language processing (syntax analysis), I somehow found that they had a sentence structure that was difficult for humans to read.

――However, I think that there are many factors other than the syntactic structure in the readability (difficulty of reading) of sentences. Although it is a sentence that is syntactically accepted (not a non-sentence), it may contain failures and contradictions at the semantic level. Parsing cannot detect that level of failure.

――Why is it interesting at the meaning level? Or is it addictive? I also want to analyze it. How do you do it?

This is the code I used this time. https://gist.github.com/naoyu822/6703e3b22eb98230ab76596918c6779e

Recommended Posts

I tried to linguistically analyze Karen Takizawa's incomprehensible sentences.
I tried web scraping to analyze the lyrics.
Qiita Job I tried to analyze the job offer
I started to analyze
I tried to debug.
I tried to paste
I tried to analyze J League data with Python
I tried to learn PredNet
I tried to organize SVM.
I tried to implement PCANet
I tried to reintroduce Linux
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to analyze the whole novel "Weathering with You" ☔️
I tried machine learning to convert sentences into XX style
I tried to analyze scRNA-seq data using Topological Data Analysis (TDA)
I tried to implement Deep VQE
I tried to create Quip API
I tried to touch Python (installation)
I tried summarizing sentences with summpy
I tried to implement adversarial validation
I tried to explain Pytorch dataset
I tried to analyze my favorite singer (SHISHAMO) using Spotify API
I tried Watson Speech to Text
I tried to touch Tesla's API
I tried to implement hierarchical clustering
I tried to organize about MCMC.
I tried to implement Realness GAN
I tried to move the ball
I tried to estimate the interval.
I tried to analyze emotions whether Hinatazaka46 is really a "happy aura"
[Python] I tried to analyze the pitcher who achieved no hit no run
I tried to summarize various sentences using the automatic summarization API "summpy"
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to analyze the New Year's card by myself using python
I tried to summarize Python exception handling
I tried to implement PLSA in Python
I tried using Azure Speech to Text.
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to implement permutation in Python
I tried to create a linebot (preparation)
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to get started with Hy
I tried to implement PLSA in Python 2
Python3 standard input I tried to summarize
I tried to classify text using TensorFlow
I want to analyze logs with Python
I tried adding post-increment to CPython Implementation
I tried to implement ADALINE in Python
I tried to let optuna solve Sudoku
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
I tried to implement CVAE with PyTorch
I tried to make a Web API