The prize money of Roger and Whitebeard's father was released, and I was wondering how far the difference between Luffy who is aiming for the pirate king and the big pirates is, so I decided to visualize it as a practice of matplotlib. ..
-Python 3.8.0 ・ Windows10 ・ Jupyter Notebook
・ Study logarithm with dragon ball combat power and matplotlib ・ [2019 latest version] "One Piece" prize money ranking! Roger's bounty finally discovered
I got the One Piece bounty from the article here.
The scraping flow is as follows.
scraping_onepiece.py
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time
list_df = pd.DataFrame(columns=['Character', 'Sweepstakes'])
url = 'https://ciatr.jp/topics/311415'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
#Get all character bounties
bounty_tags = soup.find_all('div', class_='component ArticleComponentHead')
bounty_tags = soup.find_all('h3')
#Included in h3 element['Work information', 'Article information', 'About ciatr']Ignore
bounty_tags = [tag.text for tag in bounty_tags if tag.text not in ['Work information', 'Article information', 'About ciatr']]
#Wait 1 second
time.sleep(1)
#Get character name
name_tags = soup.find_all('div', class_='component ArticleComponentHead scroll-margin')
name_tags = [tag.text for tag in name_tags if tag.text not in ['I want to know the prize money of the powerful players who appear in "ONE PIECE"!',
'There are still big names with unknown prizes! The maximum amount is likely to be updated']]
# {Character name:Sweepstakes}Create a dictionary
master_table = dict()
for bounty_tag, name_tag in zip(bounty_tags, name_tags):
#Remove extra characters with regular expressions
bounty = re.sub(r'[More than the original berry prize]', '', bounty_tag)
#Excludes characters whose prize money does not exceed 100 million
if 'Billion' in bounty:
#After that, the prize money will be used in matplotlib, so convert it to an integer.
oku_ind = bounty.find("Billion")
man_ind = bounty.find("Ten thousand")
oku = bounty[:oku_ind]
if bounty[oku_ind + 1:man_ind] != '':
man = bounty[oku_ind + 1:man_ind]
else:
man = 0
bounty = int(oku) * 100000000 + int(man) * 10000
name = re.sub(r'[【】0-9th place]', '', name_tag)
#Add to dictionary
master_table[name] = bounty
for name, bounty in master_table.items():
tmp_se = pd.DataFrame([name, bounty], index=list_df.columns).T
list_df = list_df.append(tmp_se)
print(list_df)
#csv save
list_df.to_csv('onepiece.csv', mode = 'a', encoding='cp932')
malplotlib_onepiece.ipynb
#Description specified when drawing a graph on Jupyter Notebook
%matplotlib inline
python
import matplotlib.pyplot as plt
import pandas as pd
python
plt.rcParams["font.family"] = "IPAexGothic"
Font error is [here](https://datumstudio.jp/blog/matplotlib%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E6%96%87%E5 % AD% 97% E5% 8C% 96% E3% 81% 91% E3% 82% 92% E8% A7% A3% E6% B6% 88% E3% 81% 99% E3% 82% 8Bwindows% E7% B7 I referred to% A8).
[Eliminate garbled Japanese characters in matplotlib (Windows)](https://datumstudio.jp/blog/matplotlib%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA% 9E% E6% 96% 87% E5% AD% 97% E5% 8C% 96% E3% 81% 91% E3% 82% 92% E8% A7% A3% E6% B6% 88% E3% 81% 99% E3% 82% 8Bwindows% E7% B7% A8)
python
df_file = pd.read_csv('onepiece.csv', encoding='cp932')
python
df_name = df_file['Character'].tolist()
df_bounty = df_file['Sweepstakes'].tolist()
python
d = []
for name, bounty in zip(df_name, df_bounty):
data = [name, bounty]
d.append(data)
python
#Sweepstakes descending order
name = list(reversed([x[0] for x in d]))
bounty = list(reversed([x[1] for x in d]))
python
#Overall size of the figure
plt.figure(figsize=(12, len(d)/2))
#Bar chart
plt.barh(name, bounty)
#The title of the entire graph
plt.title('One Piece Prize Money Ranking')
#x label title
plt.xlabel('Sweepstakes(Billion)')
#Set the x-axis (prize money) scale
plt.xscale('log')
plt.show()
When you visualize it, the strength of the four emperors is not odd. I'm still looking forward to the undisclosed prize money of Master Rayleigh and Dragon. In my opinion, the dragon bounty is the boss of an organization that is trying to defeat the world government directly, so I would like to see 5 billion.
This time, I learned scraping and malplotlib for the purpose of visualizing the characters of One Piece. It was difficult and time-consuming to convert Chinese numerals in a character string to arithmetic numerals by scraping. Next time, I would like to learn unsupervised learning clustering and sentiment analysis. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.
Recommended Posts