[PYTHON] I tried to visualize the lyrics of GReeeen, which I used to listen to crazy in my youth but no longer listen to it.

Trigger

GReeeen was listening crazy in his youth. I wonder why I didn't listen to it now even though I was listening to it so much ... I started thinking that way. We will visualize the message tendency of GReeeen's songs and analyze the lyrics to understand why you stopped listening = why you couldn't sympathize with the song.

Reference article

-[Python] Visualized Arashi's lyrics with WordCloud and tried to unravel what I wanted to convey to fans in 20 years of formation
https://qiita.com/yuuuusuke1997/items/122ca7597c909e73aad5

Uta-Net
https://www.uta-net.com/

environment

Macbook Catalina10.15.4
Python 3.7.6
BeautifulSoup
janome
wordcloud
Jupiter Notebook

1. Collection of lyrics

Scraping from Uta-Net.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

#Create a table to store scraped data
list_df = pd.DataFrame(columns=['lyrics'])

for page in range(1, 3):
    #Song page top address
    base_url = 'https://www.uta-net.com'

    #Lyrics list page
    url = 'https://www.uta-net.com/artist/5384/' + str(page) + '/'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    links = soup.find_all('td', class_='side td1')
    for link in links:
        a = base_url + (link.a.get('href'))

        #Lyrics detail page
        response = requests.get(a)
        soup = BeautifulSoup(response.text, 'lxml')
        song_lyrics = soup.find('div', itemprop='lyrics')
        song_lyric = song_lyrics.text
        song_lyric = song_lyric.replace('\n','')
        #Wait 1 second to not load the server
        time.sleep(1)

        #Add the acquired lyrics to the table
        tmp_se = pd.DataFrame([song_lyric], index=list_df.columns).T
        list_df = list_df.append(tmp_se)

print(list_df)

#csv save
list_df.to_csv('/Users/username/greeeen/list.csv', mode = 'a', encoding='cp932')

2. Turn lyrics into words (morphological analysis)

from janome.tokenizer import Tokenizer
import pandas as pd
import re

#list.Read csv file
df_file = pd.read_csv('/Users/username/greeeen/list.csv', encoding='cp932')

song_lyrics = df_file['lyrics'].tolist()

t = Tokenizer()

results = []

for s in song_lyrics:
    tokens = t.tokenize(s)

    r = []

    for tok in tokens:
        if tok.base_form == '*':
            word = tok.surface
        else:
            word = tok.base_form

        ps = tok.part_of_speech

        hinshi = ps.split(',')[0]

        if hinshi in ['noun', 'adjective', 'verb', 'adverb']:
            r.append(word)

    rl = (' '.join(r)).strip()
    results.append(rl)
    #Replacement of extra character code
    result = [i.replace('\u3000','') for i in results]
    print(result)

text_file = '/Users/username/greeeen/wakati_list.txt'
with open(text_file, 'w', encoding='utf-8') as fp:
    fp.write("\n".join(result))

3. Visualization (WordCloud)

from wordcloud import WordCloud

text_file = open('/Users/username/greeeen/wakati_list.txt', encoding='utf-8')
text = text_file.read()

#Japanese font path
fpath = '/System/Library/Fonts/Hiragino Mincho ProN.ttc'

#Word removal that seems meaningless
stop_words = ['so', 'Absent', 'Is', 'To do', 'As it is', 'Yo', 'Teru', 'Become', 'thing', 'Already', 'Good', 'is there', 'go', 'To be']

wordcloud = WordCloud(background_color='white',
    font_path=fpath, width=800, height=600, stopwords=set(stop_words)).generate(text)

#The image is wordcloud.Save png in the same directory as the py file
wordcloud.to_file('./wordcloud.png')

Finished product

There are many words such as "us" and "today" that are close to the person or the present in terms of time and space. Others are associated with progress and change such as "go", "advance", and "change", and "probably" with uncertainty frequently appears. After that, you can see "laughing" and "smiling".

Conclusion

--In a word, the message tendency of GReeeen's lyrics is
** "The future is uncertain, but let's concentrate on the" now "and move forward with a smile with friends." **
..

This analysis has revealed that my adult heart is quite rough.
I think I had a cold heart to adapt to society, but it seems that I lost the hot and believing heart I had in my youth. Based on this result, I will do my best to have youthfulness like youth. I think I'll increase the number of times I laugh for the time being ...