[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.

Overview

The prize money of Roger and Whitebeard's father was released, and I was wondering how far the difference between Luffy who is aiming for the pirate king and the big pirates is, so I decided to visualize it as a practice of matplotlib. ..

environment

-Python 3.8.0 ・ Windows10 ・ Jupyter Notebook

Reference material

・ Study logarithm with dragon ball combat power and matplotlib ・ [2019 latest version] "One Piece" prize money ranking! Roger's bounty finally discovered

Rough flow

  1. Prize money ・ Acquire character name (scraping)
  2. Visualize (matplotlib)

1. Prize money ・ Acquire character name (scraping)

I got the One Piece bounty from the article here.

The scraping flow is as follows.

  1. Get all character bounty
  2. Get all character names Only over 300 million characters get the character name and prize money

scraping_onepiece.py


import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time

list_df = pd.DataFrame(columns=['Character', 'Sweepstakes'])

url = 'https://ciatr.jp/topics/311415'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
#Get all character bounties
bounty_tags = soup.find_all('div', class_='component ArticleComponentHead')
bounty_tags = soup.find_all('h3')

#Included in h3 element['Work information', 'Article information', 'About ciatr']Ignore
bounty_tags = [tag.text for tag in bounty_tags if tag.text not in ['Work information', 'Article information', 'About ciatr']]

#Wait 1 second
time.sleep(1)

#Get character name
name_tags = soup.find_all('div', class_='component ArticleComponentHead scroll-margin')

name_tags = [tag.text for tag in name_tags if tag.text not in ['I want to know the prize money of the powerful players who appear in "ONE PIECE"!',
																'There are still big names with unknown prizes! The maximum amount is likely to be updated']]

# {Character name:Sweepstakes}Create a dictionary
master_table = dict()

for bounty_tag, name_tag in zip(bounty_tags, name_tags):

	#Remove extra characters with regular expressions
	bounty = re.sub(r'[More than the original berry prize]', '', bounty_tag)

	#Excludes characters whose prize money does not exceed 100 million
	if 'Billion' in bounty:

		#After that, the prize money will be used in matplotlib, so convert it to an integer.
		oku_ind = bounty.find("Billion")
		man_ind = bounty.find("Ten thousand")

		oku = bounty[:oku_ind]
		
		if bounty[oku_ind + 1:man_ind] != '':
			man = bounty[oku_ind + 1:man_ind]
		else:
			man = 0

		bounty = int(oku) * 100000000 + int(man) * 10000

		name = re.sub(r'[【】0-9th place]', '', name_tag)

		#Add to dictionary
		master_table[name] = bounty

for name, bounty in master_table.items():
	tmp_se = pd.DataFrame([name, bounty], index=list_df.columns).T
	list_df = list_df.append(tmp_se)

print(list_df)

#csv save
list_df.to_csv('onepiece.csv', mode = 'a', encoding='cp932')

2. Visualize (matplotlib)

malplotlib_onepiece.ipynb


#Description specified when drawing a graph on Jupyter Notebook
%matplotlib inline

python


import matplotlib.pyplot as plt
import pandas as pd

python


plt.rcParams["font.family"] = "IPAexGothic"

Font error is [here](https://datumstudio.jp/blog/matplotlib%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E6%96%87%E5 % AD% 97% E5% 8C% 96% E3% 81% 91% E3% 82% 92% E8% A7% A3% E6% B6% 88% E3% 81% 99% E3% 82% 8Bwindows% E7% B7 I referred to% A8).

[Eliminate garbled Japanese characters in matplotlib (Windows)](https://datumstudio.jp/blog/matplotlib%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA% 9E% E6% 96% 87% E5% AD% 97% E5% 8C% 96% E3% 81% 91% E3% 82% 92% E8% A7% A3% E6% B6% 88% E3% 81% 99% E3% 82% 8Bwindows% E7% B7% A8)

python


df_file = pd.read_csv('onepiece.csv', encoding='cp932')

python


df_name = df_file['Character'].tolist()
df_bounty = df_file['Sweepstakes'].tolist()

python


d = []

for name, bounty in zip(df_name, df_bounty):
    data = [name, bounty]
    d.append(data)

python


#Sweepstakes descending order
name = list(reversed([x[0] for x in d]))
bounty = list(reversed([x[1] for x in d]))

python


#Overall size of the figure
plt.figure(figsize=(12, len(d)/2))
#Bar chart
plt.barh(name, bounty)
#The title of the entire graph
plt.title('One Piece Prize Money Ranking')
#x label title
plt.xlabel('Sweepstakes(Billion)')
#Set the x-axis (prize money) scale
plt.xscale('log')
plt.show()

onepiece.png

When you visualize it, the strength of the four emperors is not odd. I'm still looking forward to the undisclosed prize money of Master Rayleigh and Dragon. In my opinion, the dragon bounty is the boss of an organization that is trying to defeat the world government directly, so I would like to see 5 billion.

in conclusion

This time, I learned scraping and malplotlib for the purpose of visualizing the characters of One Piece. It was difficult and time-consuming to convert Chinese numerals in a character string to arithmetic numerals by scraping. Next time, I would like to learn unsupervised learning clustering and sentiment analysis. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.

Recommended Posts

[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to find the entropy of the image with python
[Python] I tried to visualize the follow relationship of Twitter
I tried to improve the efficiency of daily work with Python
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to summarize how to use matplotlib of python
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to visualize the spacha information of VTuber
I tried to solve the problem with Python Vol.1
I tried to summarize the string operations of Python
I tried to put out the frequent word ranking of LINE talk with Python
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
[Python] I tried to visualize tweets about Corona with WordCloud
I tried to divide the file into folders with Python
How to write offline real time I tried to solve the problem of F02 with Python
I tried to visualize the power consumption of my house with Nature Remo E lite
[Python] I want to make a 3D scatter plot of the epicenter with Cartopy + Matplotlib!
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried to solve the ant book beginner's edition with python
I tried to automate the watering of the planter with Raspberry Pi
Since the stock market crashed due to the influence of the new coronavirus, I tried to visualize the performance of my investment trust with Python.
I want to output the beginning of the next month with Python
I tried to create a list of prime numbers with python
I tried to expand the size of the logical volume with LVM
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried to visualize the common condition of VTuber channel viewers
PhytoMine-I tried to get the genetic information of plants with Python
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried to summarize the contents of each package saved by Python pip in one line
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
I tried to visualize AutoEncoder with TensorFlow
I tried to visualize the age group and rate distribution of Atcoder
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to visualize all decision trees of random forest with SVG
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to open the latest data of the Excel file managed by date in the folder with Python
I tried to output the rpm list of SSH login destination to an Excel sheet with Python + openpyxl.
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
I tried "smoothing" the image with Python + OpenCV