[PYTHON] I tried to visualize the characteristics of new coronavirus infected person information with wordcloud

Overview

--Acquired information on Japanese infected persons of the new coronavirus (COVID-19) --Morphological analysis with mecab --Visualize feature words with wordcloud

reference

Information on infected persons of the new coronavirus (COVID-19)

config

import re
import os

### MeCab
POS_LIST = [10, 11, 31, 32, 34]
POS_LIST.extend(list(range(36,50)))
POS_LIST.extend([59, 60, 62, 67])
STOP_WORDS = ["To do", "Absent", "Become", "Already", "Shiyo", "Can", "Became", "Ku", "Finally", "is there", "May", "think", "today", "It", "this", "that", "which one", "Which", "NULL", "To be", "Nari", "Ah", "Canる", "I"]
RE_ALPHABET = re.compile("^[0-9a-zA-Z0-9 .,*<>]+$") # alphabet, number, space, comma or dot
current_dir = os.getcwd()
OUTPUT_PNG_FILE = os.path.join(current_dir, "wordcloud.png ")

(Omitted)

Morphological analysis

import MeCab
from os import path
from wordcloud import WordCloud
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import re

def create_mecab_list(text_list):
	mecab_list = []
	mecab = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd") # MacOS
	mecab.parse("")
	# encoding = text.encode('utf-8')
	for text in text_list:
		node = mecab.parseToNode(text)
		while node:
			# [Part of speech,Part of speech細分類1,Part of speech細分類2,Part of speech細分類3,Inflected form,Utilization type,Prototype,reading,pronunciation]
			#Busy adjective,Independence,*,*,Adjective, Idan,Continuous connection,busy,Isogasiku,Isogasiku
			morpheme = node.feature.split(",")[6]
			if RE_ALPHABET.match(morpheme):
				node = node.next
				continue
			if morpheme in STOP_WORDS:
				node = node.next
				continue
			if len(morpheme) > 1:
				if node.posid in POS_LIST:
					mecab_list.append(morpheme)
			node = node.next
	return mecab_list

wordcloud

import MeCab
from os import path
from wordcloud import WordCloud
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import re

def create_wordcloud(morphemes):
	# fpath = "/usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf" # Ubuntu
	fpath = "/System/Library/Fonts/Hiragino Maru Go ProN W4.ttc" # Mac OS X
	wordcloud = WordCloud(
		background_color="whitesmoke",
		collocations=False,
		stopwords=set(STOP_WORDS),
		max_font_size=80,
		relative_scaling=.5,
		width=800,
		height=500,
		font_path=fpath
		).generate(morphemes)
	plt.figure()
	plt.imshow(wordcloud)
	plt.axis("off")
	wordcloud.to_file(OUTPUT_PNG_FILE)

result

wordcloud (5).png

――There are more infected people in "male" than in "female" -→ "Male" has a larger font size than "Female" ――There are many other than "20's" --"Mask" is important ...

Other new corona related information

--Ministry of Health, Labor and Welfare --Q & A about the new coronavirus: - https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/kenkou_iryou/dengue_fever_qa_00001.html ――Guidelines for consultation and consultation regarding new coronavirus infections - https://www.mhlw.go.jp/content/10900000/000596905.pdf

Recommended Posts

I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to visualize the spacha information of VTuber
I tried to automatically send the literature of the new coronavirus to LINE with Python
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to find the entropy of the image with python
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
[Python] I tried to visualize tweets about Corona with WordCloud
[Python] I tried to visualize the follow relationship of Twitter
Let's visualize the number of people infected with coronavirus with matplotlib
I tried to predict the number of people infected with coronavirus in Japan by the method of the latest paper in China
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
I tried to automate the watering of the planter with Raspberry Pi
I tried to expand the size of the logical volume with LVM
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
PhytoMine-I tried to get the genetic information of plants with Python
I tried to visualize AutoEncoder with TensorFlow
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to visualize the age group and rate distribution of Atcoder
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to visualize all decision trees of random forest with SVG
I tried to save the data with discord
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
I tried to display the infection condition of coronavirus on the heat map of seaborn
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to learn the sin function with chainer
I tried to extract features with SIFT of OpenCV
I tried to summarize the basic form of GPLVM
Add information to the bottom of the figure with Matplotlib
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to publish GraphQL API of COVID19 infected person situation in Hyogo prefecture.
I tried to put out the frequent word ranking of LINE talk with Python
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
The theory that the key to controlling infection with the new coronavirus is hyperdispersion of susceptibility
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python