[Python] Visualize Arashi's lyrics with WordCloud and try to understand what I wanted to convey to fans in the 20th year of formation.

Trigger

It's only one year left until Arashi's activity is suspended. It's been 20 years since the appearance of the invisibility costume. What did the national idols who are active in multiplayer want to tell their fans in the 20 years since their formation? I'd like to meet you in person, but that's not the case. So I decided to "visualize the lyrics" and convey the message I want to convey to the fans ~~ the sixth member ~~ I will convey to Arashi fans.

environment

-Python 3.7.3 ・ Windows10

Reference material

"・ Utane t" (hps: // ww. Utane t. This m) ・ I tried to visualize the lyrics of Kenshi Yonezu with WordCloud.

Rough flow

  1. Collecting lyrics (scraping)
  2. Turn lyrics into words (morphological analysis)
  3. Visualization (WordCloud)

1. Collecting lyrics (scraping)

scraping_arashi.py


import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

#Create a table to store scraped data
list_df = pd.DataFrame(columns=['lyrics'])

for page in range(1, 3):
	#Song page top address
	base_url = 'https://www.uta-net.com'

	#Lyrics list page
	url = 'https://www.uta-net.com/artist/3891/0/' + str(page) + '/'
	response = requests.get(url)
	soup = BeautifulSoup(response.text, 'lxml')
	links = soup.find_all('td', class_='side td1')
	for link in links:
		a = base_url + (link.a.get('href'))

		#Lyrics detail page
		response = requests.get(a)
		soup = BeautifulSoup(response.text, 'lxml')
		song_lyrics = soup.find('div', itemprop='lyrics')
		song_lyric = song_lyrics.text
		song_lyric = song_lyric.replace('\n','')
		#Wait 1 second to not load the server
		time.sleep(1)

		#Add the acquired lyrics to the table
		tmp_se = pd.DataFrame([song_lyric], index=list_df.columns).T
		list_df = list_df.append(tmp_se)

print(list_df)

#csv save
list_df.to_csv('list.csv', mode = 'a', encoding='cp932')

2. Turn lyrics into words (morphological analysis)

morphological_analysis_arashi.py


from janome.tokenizer import Tokenizer
import pandas as pd
import re

#list.Read csv file
df_file = pd.read_csv('list.csv', encoding='cp932')

song_lyrics = df_file['lyrics'].tolist()

t = Tokenizer()

results = []

for s in song_lyrics:
	tokens = t.tokenize(s)
	
	r = []

	for tok in tokens:
		if tok.base_form == '*':
			word = tok.surface
		else:
			word = tok.base_form

		ps = tok.part_of_speech

		hinshi = ps.split(',')[0]

		if hinshi in ['noun', 'adjective', 'verb', 'adverb']:
			r.append(word)

	rl = (' '.join(r)).strip()
	results.append(rl)
	#Replacement of extra character code
	result = [i.replace('\u3000','') for i in results]
	print(result)

text_file = 'wakati_list.txt'
with open(text_file, 'w', encoding='utf-8') as fp:
	fp.write("\n".join(result))

3. Visualization (WordCloud)

wordcloud_arashi.py


from wordcloud import WordCloud

text_file = open('wakati_list.txt', encoding='utf-8')
text = text_file.read()

#Japanese font path
fpath = 'C:/Windows/Fonts/YuGothM.ttc'

#Word removal that seems meaningless
stop_words = ['so', 'Absent', 'Is', 'To do', 'As it is', 'Yo', 'Teru', 'Become', 'thing', 'Already', 'Good', 'is there', 'go', 'To be']

wordcloud = WordCloud(background_color='white',
	font_path=fpath, width=800, height=600, stopwords=set(stop_words)).generate(text)

#The image is wordcloud.Save png in the same directory as the py file
wordcloud.to_file('./wordcloud.png')

↓ ↓ How about the result ↓ ↓

Execution result

It feels good! wordcloud_sample.png

Summary

By visualizing the lyrics, I found that words such as "future," "us," "here," and "see" that feel the warmth of Arashi frequently appear (* ´ ▽ ` *).

Message from the storm

Let's walk toward the future with us. And I'll be by your side all the time. One year left until the activity is suspended, it will cause A / RA / SHI whirlwind all over Japan (~~ Message from me, the sixth member. ~~)

Fans can convey Arashi's feelings without me saying it, right?

"We" Arashi fans will support Arashi with all their might until the end. Good luck ARASHI. And if it pops, Yea!

in conclusion

I enjoyed learning about scraping, morphological analysis, and how to use WordCloud based on Arashi songs. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.

Recommended Posts

[Python] Visualize Arashi's lyrics with WordCloud and try to understand what I wanted to convey to fans in the 20th year of formation.
Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
[CleanArchitecture with Python] Apply CleanArchitecture step by step to a simple API and try to understand "what kind of change is strong" in the code base.
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I wanted to visualize 3D particle simulation with the Python visualization library Matplotlib.
I'm an amateur on the 14th day of python, but I want to try machine learning with scikit-learn
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
Try to operate DB with Python and visualize with d3
I tried to automate the article update of Livedoor blog with Python and selenium.
Visualize the range of interpolation and extrapolation with python
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I just wanted to extract the data of the desired date and time with Django
I tried to compare the processing speed with dplyr of R and pandas of Python
An introduction to cross-platform GUI software made with Python / Tkinter! (And many Try and Error)! (In the middle of writing)
The 15th offline real-time I tried to solve the problem of how to write with python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I also tried to imitate the function monad and State monad with a generator in Python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
Try to visualize the nutrients of corn flakes that M-1 champion Milkboy said with Python
I tried to find the entropy of the image with python
Try scraping the data of COVID-19 in Tokyo with Python
I wanted to solve the Panasonic Programming Contest 2020 with Python
What I did to welcome the Python2 EOL with confidence
[Python] I tried to visualize tweets about Corona with WordCloud
[Python] I tried to visualize the follow relationship of Twitter
Try to automate the operation of network devices with Python
I want to know the features of Python and pip
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I compared the speed of Hash with Topaz, Ruby and Python
[Cloudian # 9] Try to display the metadata of the object in Python (boto3)
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
[Introduction to Python] I compared the naming conventions of C # and Python.
What I was addicted to with json.dumps in Python base64 encoding
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
I wrote the code to write the code of Brainf * ck in python
Visualize the frequency of word occurrences in sentences with Word Cloud. [Python]
I wanted to solve the ABC164 A ~ D problem with Python
Try to implement and understand the segment tree step by step (python)
I tried to improve the efficiency of daily work with Python
When I try to execute the make command of Makefile with os / exec of golang, the second and subsequent executions result in an error.
I tried to open the latest data of the Excel file managed by date in the folder with Python
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
[Completed version] Try to find out the number of residents in the town from the address list with Python
What skills do I need to program with the FBX SDK Python?
I replaced the numerical calculation of Python with Rust and compared the speed
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
I tried to visualize the age group and rate distribution of Atcoder
What I did to keep track of the humidity and temperature of the archive
How to get the date and time difference in seconds with python
Try to image the elevation data of the Geographical Survey Institute with Python
I set the environment variable with Docker and displayed it in Python
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I tried to get the authentication code of Qiita API with Python.
I want to express my feelings with the lyrics of Mr. Children
The file name was bad in Python and I was addicted to import
I tried to verify and analyze the acceleration of Python by Cython
I tried to streamline the standard role of new employees with Python