GReeeen was listening crazy in his youth. I wonder why I didn't listen to it now even though I was listening to it so much ... I started thinking that way. We will visualize the message tendency of GReeeen's songs and analyze the lyrics to understand why you stopped listening = why you couldn't sympathize with the song.
-[Python] Visualized Arashi's lyrics with WordCloud and tried to unravel what I wanted to convey to fans in 20 years of formation
https://qiita.com/yuuuusuke1997/items/122ca7597c909e73aad5
Scraping from Uta-Net.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
#Create a table to store scraped data
list_df = pd.DataFrame(columns=['lyrics'])
for page in range(1, 3):
#Song page top address
base_url = 'https://www.uta-net.com'
#Lyrics list page
url = 'https://www.uta-net.com/artist/5384/' + str(page) + '/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
links = soup.find_all('td', class_='side td1')
for link in links:
a = base_url + (link.a.get('href'))
#Lyrics detail page
response = requests.get(a)
soup = BeautifulSoup(response.text, 'lxml')
song_lyrics = soup.find('div', itemprop='lyrics')
song_lyric = song_lyrics.text
song_lyric = song_lyric.replace('\n','')
#Wait 1 second to not load the server
time.sleep(1)
#Add the acquired lyrics to the table
tmp_se = pd.DataFrame([song_lyric], index=list_df.columns).T
list_df = list_df.append(tmp_se)
print(list_df)
#csv save
list_df.to_csv('/Users/username/greeeen/list.csv', mode = 'a', encoding='cp932')
from janome.tokenizer import Tokenizer
import pandas as pd
import re
#list.Read csv file
df_file = pd.read_csv('/Users/username/greeeen/list.csv', encoding='cp932')
song_lyrics = df_file['lyrics'].tolist()
t = Tokenizer()
results = []
for s in song_lyrics:
tokens = t.tokenize(s)
r = []
for tok in tokens:
if tok.base_form == '*':
word = tok.surface
else:
word = tok.base_form
ps = tok.part_of_speech
hinshi = ps.split(',')[0]
if hinshi in ['noun', 'adjective', 'verb', 'adverb']:
r.append(word)
rl = (' '.join(r)).strip()
results.append(rl)
#Replacement of extra character code
result = [i.replace('\u3000','') for i in results]
print(result)
text_file = '/Users/username/greeeen/wakati_list.txt'
with open(text_file, 'w', encoding='utf-8') as fp:
fp.write("\n".join(result))
from wordcloud import WordCloud
text_file = open('/Users/username/greeeen/wakati_list.txt', encoding='utf-8')
text = text_file.read()
#Japanese font path
fpath = '/System/Library/Fonts/Hiragino Mincho ProN.ttc'
#Word removal that seems meaningless
stop_words = ['so', 'Absent', 'Is', 'To do', 'As it is', 'Yo', 'Teru', 'Become', 'thing', 'Already', 'Good', 'is there', 'go', 'To be']
wordcloud = WordCloud(background_color='white',
font_path=fpath, width=800, height=600, stopwords=set(stop_words)).generate(text)
#The image is wordcloud.Save png in the same directory as the py file
wordcloud.to_file('./wordcloud.png')
There are many words such as "us" and "today" that are close to the person or the present in terms of time and space. Others are associated with progress and change such as "go", "advance", and "change", and "probably" with uncertainty frequently appears. After that, you can see "laughing" and "smiling".
--In a word, the message tendency of GReeeen's lyrics is
** "The future is uncertain, but let's concentrate on the" now "and move forward with a smile with friends." **
..
This analysis has revealed that my adult heart is quite rough.
I think I had a cold heart to adapt to society, but it seems that I lost the hot and believing heart I had in my youth.
Based on this result, I will do my best to have youthfulness like youth.
I think I'll increase the number of times I laugh for the time being ...