Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud

perfume.png

This article is the sixth day of estie Advent Calendar 2019. I'm an engineer at a real estate venture estie.inc.

Introduction

Recently, this article has become a hot topic! [Python] I visualized Arashi's lyrics on WordCloud and tried to unravel what I wanted to convey to fans in the 20th year of formation

It makes me really happy when my favorite idols and artists have been active and loved for many years. As a fan, I really understand their words and the desire to confirm what I wanted to convey.

By the way, by chance, there is an artist who also celebrated the 20th anniversary of its formation.

That's right, everyone loves Perfume.

As you know, Perfume has a high affinity with technology, and [Google's Machine Learning](https://cloud.google.com/blog/ja/products/gcp/nhk-perfume-technology-reframe-your-photo- We continue to send out cutting-edge expressions such as live production using google-tensorflow) and live distribution on 5G. I will. Mr. Rhizomatiks.

So, as one of the fans who have been to Perfume's live for about 10 years I will try morphological analysis + WordCloud visualization of Perfume lyrics.

environment

manner

Like our predecessors Acquisition of lyrics → Morphological analysis → WordCloud I will do it. For details, please see [Reference Site](#Reference Site)

Morphological analysis tool

I haven't done much text mining, so I thought it was MeCab when it came to morphological analysis. It seems that there are various morphological analysis tools when I look it up.

This time from among them

I would like to try this trio.

MeCab This is a standard morphological analysis tool developed by the current Google Japanese Input developer. It works in any environment, but a separate dictionary is required for analysis. This time I used the officially recommended IPA dictionary + new word dictionary

macab_.py


import MeCab

#Reading lyrics file
text_data = open("perfume.txt", "rb").read()
text = text_data.decode('utf-8')

#Morphological analysis
mecab = MeCab.Tagger("-ochasen")
node = mecab.parseToNode(text)

perfume_list = []
tags = ["noun","verb", "adverb", "adjective", "形容verb"]

while node:
    #Word extraction
    word = node.surface
    #Extraction of part of speech
    word_class = node.feature.split(",")[0]
    
    #Extract only specific part of speech
    if word_class in tags:
        perfume_list.append(word)

    node = node.next

print(perfume_list)

Janome This is also the second most popular analysis tool after MeCab. Execution speed is slower than MeCab, but there are few dictionary inclusions and dependent libraries pip install janome The ease with which the installation is completed is attractive. It seems that it is often used in the verification of the previous stage of MeCab.

janome_.py


from janome.tokenizer import Tokenizer

#Reading lyrics file
text_data = open("perfume.txt", "rb").read()
text = text_data.decode('utf-8')

#Morphological analysis
t = Tokenizer()
seps = t.tokenize(text)

perfume_list = []
tags = ["noun","verb", "adverb", "adjective", "形容verb"]

for _ in seps:
    #Word extraction
    if _.base_form == '*':
        word = _.surface
    else:
        word = _.base_form

    #Extraction of part of speech
    ps = _.part_of_speech
    word_class = ps.split(',')[0]

    #Extract only specific part of speech
    if word_class in tags:
        perfume_list.append(word)

print(perfume_list)

Nagisa This is a relatively new tool. Easy to build environment like Janome pip install nagisa Installation is complete. This time it's lyrics, so I can't make use of it, but it seems that it can perform robust analysis for emoticons and URLs. There is a filtering method for output words by part of speech, so it can be easily extracted.

nagisa_.py


import nagisa

#Reading lyrics file
text_data = open("perfume.txt", "rb").read()
text = text_data.decode('utf-8')

#Morphological analysis / word extraction by specifying part of speech
tags = ["noun","verb", "adverb", "adjective", "形容verb"]
perfume_list = nagisa.extract(text, extract_postags=tags).words

print(perfume_list)

result

Mecab and Janome, which use the same dictionary, gave similar results.

in conclusion

Pa Pa I'm sure you're loving you today, isn't it a disco disco? There are many songs that repeat the song titles, so that influence is also reflected!

The number of text mining tools is abundant and easy to use, and I'm happy to be able to easily visualize this. Why don't you try it with your favorite artist?


By the way, in estie I'm currently joining, by visualizing office data We offer a variety of real estate x technology services. If you are thinking of moving your office, please use estie! We also provide a real estate data platform estie pro.

Also, estie is looking for a web engineer Wantedly Please feel free to come visit us at the office!

Reference site

--Lyrics obtained from uta-net -[Python] Visualized Arashi's lyrics with WordCloud and tried to unravel what I wanted to convey to fans in 20 years of formation -[Python] I tried to visualize the night on the Galactic Railroad with WordCloud! -nagisa: Japanese word division and part-of-speech tagging tool by RNN

Recommended Posts

Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
I tried to vectorize the lyrics of Hinatazaka46!
[Python] Visualize Arashi's lyrics with WordCloud and try to understand what I wanted to convey to fans in the 20th year of formation.
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to put out the frequent word ranking of LINE talk with Python
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
The 15th offline real-time I tried to solve the problem of how to write with python
I tried to find the entropy of the image with python
[Python] I tried to visualize the follow relationship of Twitter
I tried to visualize the lyrics of GReeeen, which I used to listen to crazy in my youth but no longer listen to it.
[Flask & Bootstrap] Visualize the content of lyrics in Word Cloud ~ Lyrics Word Cloud ~
I tried to display the point cloud data DB of Shizuoka prefecture with Vue + Leaflet
I tried to visualize the power consumption of my house with Nature Remo E lite
Since the stock market crashed due to the influence of the new coronavirus, I tried to visualize the performance of my investment trust with Python.
I tried to automate the watering of the planter with Raspberry Pi
I tried to summarize what was output with Qiita with Word cloud
Visualize the frequency of word occurrences in sentences with Word Cloud. [Python]
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to recognize the wake word
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I want to express my feelings with the lyrics of Mr. Children
I tried to automatically extract the movements of PES players with software
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
I tried to visualize all decision trees of random forest with SVG
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried web scraping to analyze the lyrics.
I tried to save the data with discord
When I tried to change the root password with ansible, I couldn't access it.
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to verify the Big Bang theorem [Is it about to come back?]
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
I wanted to know the number of lines in multiple files, so I tried to get it with a command
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to predict the price of ETF
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
[SLAYER] I visualized the lyrics of thrash metal and checked the soul of steel [Word Cloud]
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to learn the sin function with chainer
I tried to extract features with SIFT of OpenCV
Since memory_profiler of python is heavy, I measured it
I tried to summarize the basic form of GPLVM
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to build ML Pipeline with Cloud Composer
Try to get the contents of Word with Golang
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python