[PYTHON] I tried to visualize the lyrics of GReeeen, which I used to listen to crazy in my youth but no longer listen to it.

Trigger

GReeeen was listening crazy in his youth. I wonder why I didn't listen to it now even though I was listening to it so much ... I started thinking that way. We will visualize the message tendency of GReeeen's songs and analyze the lyrics to understand why you stopped listening = why you couldn't sympathize with the song.

Reference article

-[Python] Visualized Arashi's lyrics with WordCloud and tried to unravel what I wanted to convey to fans in 20 years of formation
https://qiita.com/yuuuusuke1997/items/122ca7597c909e73aad5

environment

1. Collection of lyrics

Scraping from Uta-Net.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

#Create a table to store scraped data
list_df = pd.DataFrame(columns=['lyrics'])

for page in range(1, 3):
    #Song page top address
    base_url = 'https://www.uta-net.com'

    #Lyrics list page
    url = 'https://www.uta-net.com/artist/5384/' + str(page) + '/'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    links = soup.find_all('td', class_='side td1')
    for link in links:
        a = base_url + (link.a.get('href'))

        #Lyrics detail page
        response = requests.get(a)
        soup = BeautifulSoup(response.text, 'lxml')
        song_lyrics = soup.find('div', itemprop='lyrics')
        song_lyric = song_lyrics.text
        song_lyric = song_lyric.replace('\n','')
        #Wait 1 second to not load the server
        time.sleep(1)

        #Add the acquired lyrics to the table
        tmp_se = pd.DataFrame([song_lyric], index=list_df.columns).T
        list_df = list_df.append(tmp_se)

print(list_df)

#csv save
list_df.to_csv('/Users/username/greeeen/list.csv', mode = 'a', encoding='cp932')

2. Turn lyrics into words (morphological analysis)

from janome.tokenizer import Tokenizer
import pandas as pd
import re

#list.Read csv file
df_file = pd.read_csv('/Users/username/greeeen/list.csv', encoding='cp932')

song_lyrics = df_file['lyrics'].tolist()

t = Tokenizer()

results = []

for s in song_lyrics:
    tokens = t.tokenize(s)

    r = []

    for tok in tokens:
        if tok.base_form == '*':
            word = tok.surface
        else:
            word = tok.base_form

        ps = tok.part_of_speech

        hinshi = ps.split(',')[0]

        if hinshi in ['noun', 'adjective', 'verb', 'adverb']:
            r.append(word)

    rl = (' '.join(r)).strip()
    results.append(rl)
    #Replacement of extra character code
    result = [i.replace('\u3000','') for i in results]
    print(result)

text_file = '/Users/username/greeeen/wakati_list.txt'
with open(text_file, 'w', encoding='utf-8') as fp:
    fp.write("\n".join(result))

3. Visualization (WordCloud)

from wordcloud import WordCloud

text_file = open('/Users/username/greeeen/wakati_list.txt', encoding='utf-8')
text = text_file.read()

#Japanese font path
fpath = '/System/Library/Fonts/Hiragino Mincho ProN.ttc'

#Word removal that seems meaningless
stop_words = ['so', 'Absent', 'Is', 'To do', 'As it is', 'Yo', 'Teru', 'Become', 'thing', 'Already', 'Good', 'is there', 'go', 'To be']

wordcloud = WordCloud(background_color='white',
    font_path=fpath, width=800, height=600, stopwords=set(stop_words)).generate(text)

#The image is wordcloud.Save png in the same directory as the py file
wordcloud.to_file('./wordcloud.png')

Finished product

image.png

There are many words such as "us" and "today" that are close to the person or the present in terms of time and space. Others are associated with progress and change such as "go", "advance", and "change", and "probably" with uncertainty frequently appears. After that, you can see "laughing" and "smiling".

Conclusion

--In a word, the message tendency of GReeeen's lyrics is
** "The future is uncertain, but let's concentrate on the" now "and move forward with a smile with friends." **
..

This analysis has revealed that my adult heart is quite rough.
I think I had a cold heart to adapt to society, but it seems that I lost the hot and believing heart I had in my youth. Based on this result, I will do my best to have youthfulness like youth. I think I'll increase the number of times I laugh for the time being ...

Recommended Posts

I tried to visualize the lyrics of GReeeen, which I used to listen to crazy in my youth but no longer listen to it.
Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
I tried to vectorize the lyrics of Hinatazaka46!
I tried to visualize the spacha information of VTuber
I tried to summarize the code often used in Pandas
[Python] I tried to visualize the follow relationship of Twitter
I tried to summarize the commands often used in business
I tried to visualize the power consumption of my house with Nature Remo E lite
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)
I tried to summarize the frequently used implementation method of pytest-mock
I tried to visualize the common condition of VTuber channel viewers
I tried to visualize the age group and rate distribution of Atcoder
I tried how to improve the accuracy of my own Neural Network
I want to express my feelings with the lyrics of Mr. Children
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to display the altitude value of DTM in a graph
[Supplement to the previous article] I tried using the PUSH API of LINE Bot, which had become available in the free plan before I knew it.
I tried to wake up the place name that appears in the lyrics of Masashi Sada on the heat map
I wanted to know the number of lines in multiple files, so I tried to get it with a command
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to rescue the data of the laptop by booting it on Ubuntu
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I tried my best to make an optimization function, but it didn't work.
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to predict the price of ETF
I tried to make OneHotEncoder, which is often used for data analysis, so that it can reach the itch.
[No code] I wrote about elliptic curves and blockchain in my thesis, so I tried to summarize the study method.
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to scrape YouTube, but I can use the API, so don't do it.
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I used Python to find out about the role choices of the 51 "Yachts" in the world.
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
Since the stock market crashed due to the influence of the new coronavirus, I tried to visualize the performance of my investment trust with Python.
I tried to summarize the basic form of GPLVM
I tried to erase the negative part of Meros
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
[Natural language processing] I tried to visualize the hot topics this week in the Slack community
I want to see something beautiful, so I tried to visualize the function used for benchmarking the optimization function.
I tried to create a Python script to get the value of a cell in Microsoft Excel
I was in charge of maintaining the Fabric script, but I don't know.> <To those who
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to put HULFT IoT (Edge Streaming) in the gateway Rooster of Sun Electronics
I tried to find the trend of the number of ships in Tokyo Bay from satellite images.
[Python] Visualize Arashi's lyrics with WordCloud and try to understand what I wanted to convey to fans in the 20th year of formation.
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the location information of Odakyu Bus
I tried the accuracy of three Stirling's approximations in python
I tried to find the average of the sequence with TensorFlow
I tried to illustrate the time and time in C language
I wrote it in Go to understand the SOLID principle
I tried to implement the mail sending function in Python
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
I tried to fight the Local Minimum of Goldstein-Price Function