This is the 10th article of Protoout Studio Advent Calendar!
After entering the proto-out studio, I started to output on Qiita. (Although there are still few)
So this time, as a reflection of what I've written so far, I would like to visualize what I output with Python's Word Cloud.
WordCloud selects words that appear frequently from sentences and illustrates them in a size according to the frequency of appearance of the words. Since there is a library for Python, I will refer to the code here as well. http://amueller.github.io/word_cloud/index.html
In order to visualize it in Word Cloud, I will scrape my Qiita and collect sentences (materials). While using my past articles
First, collect and visualize the tag information of the article.
scraping.py
import urllib.request
from bs4 import BeautifulSoup
url = "https://qiita.com/sksk_go"
res = urllib.request.urlopen(url)
soup = BeautifulSoup(res, 'html.parser')
#Rewrite for tag acquisition
name = soup.find_all("a",class_="u-link-unstyled TagList__label")
ret = []
for t in name:
ret.append(t.text)
print(ret)
Let's express the collected text in Word Cloud.
I will refer to this article. [Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
wordcloud.py
import MeCab
from wordcloud import WordCloud
data = open("data.txt","rb").read()
text = data.decode('utf-8')
mecab = MeCab.Tagger("-ochasen")
node = mecab.parseToNode(text)
data_text = []
while node:
word = node.surface
hinnsi = node.feature.split(",")[0]
if hinnsi in ["verb","adverb","adjective","noun"]:
data_text.append(word)
else:
print("|{0}|Part of speech is{1}So don't add".format(node.surface,node.feature.split(",")[0]))
print("-"*35)
node = node.next
text = ' '.join(data_text)
#Excluded words
stop_words = [ u'Teru', u'Is', u'Become', u'To be', u'To do', u'is there', u'thing', u'this', u'Mr.', u'do it', \
u'Give me', u'do', u'Give me', u'so', u'Let', u'did', u'think', \
u'It', u'here', u'Chan', u'Kun', u'', u'hand',u'To',u'To',u'Is',u'of', u'But', u'When', u'Ta', u'Shi', u'so', \
u'Absent', u'Also', u'Nana', u'I', u'Or', u'So', u'Yo', u'']
wordcloud = WordCloud(font_path='/System/Library/Fonts/Hiragino Mincho ProN.ttc',width=480, height=300,background_color='white',stopwords=set(stop_words))
#Generate a word cloud from text.
wordcloud.generate(text)
#Save to a file.
wordcloud.to_file('wordcloud.png')
The amount of articles is small, so it's scary ... Feeling that there are a lot of Python components. It also includes the IoT that was deeply taught.
Since it was only a tag earlier, I will take the text of my Qiita article and visualize it with Word Cloud.
There are some strange words in it, but I can understand what I mean. After all, there are a lot of components such as Python and machine learning. My interest is rather strong. You can see the tendency.
I tried using Word Cloud with Qiita as the subject, but it seems more interesting to take it from ordinary sentences such as Twitter and blogs. It would be interesting to try with lyrics, novels, and other sentences.
This is aki_suga! looking forward to!
Recommended Posts