[PYTHON] I tried to summarize what was output with Qiita with Word cloud

This is the 10th article of Protoout Studio Advent Calendar!

Overview

After entering the proto-out studio, I started to output on Qiita. (Although there are still few)

So this time, as a reflection of what I've written so far, I would like to visualize what I output with Python's Word Cloud.

About Word Cloud

WordCloud selects words that appear frequently from sentences and illustrates them in a size according to the frequency of appearance of the words. Since there is a library for Python, I will refer to the code here as well. http://amueller.github.io/word_cloud/index.html

Collect sentences by scraping

In order to visualize it in Word Cloud, I will scrape my Qiita and collect sentences (materials). While using my past articles

First, collect and visualize the tag information of the article.

scraping.py


import urllib.request
from bs4 import BeautifulSoup

url = "https://qiita.com/sksk_go"
res = urllib.request.urlopen(url)
soup = BeautifulSoup(res, 'html.parser')
#Rewrite for tag acquisition
name = soup.find_all("a",class_="u-link-unstyled TagList__label")
ret = []
for t in name:
    ret.append(t.text)

print(ret)

Let's express the collected text in Word Cloud.

Processed in Word Cloud

I will refer to this article. [Python] I tried to visualize the night on the Galactic Railroad with WordCloud!

wordcloud.py


import MeCab
from wordcloud import WordCloud

data = open("data.txt","rb").read()
text = data.decode('utf-8')

mecab = MeCab.Tagger("-ochasen")
node = mecab.parseToNode(text)

data_text = []

while node:
    word = node.surface
    hinnsi = node.feature.split(",")[0]
    if hinnsi in ["verb","adverb","adjective","noun"]:
        data_text.append(word)
    else:
        print("|{0}|Part of speech is{1}So don't add".format(node.surface,node.feature.split(",")[0]))
        print("-"*35)
    node = node.next

text = ' '.join(data_text)
#Excluded words
stop_words = [ u'Teru', u'Is', u'Become', u'To be', u'To do', u'is there', u'thing', u'this', u'Mr.', u'do it', \
             u'Give me', u'do', u'Give me', u'so', u'Let', u'did',  u'think',  \
             u'It', u'here', u'Chan', u'Kun', u'', u'hand',u'To',u'To',u'Is',u'of', u'But', u'When', u'Ta', u'Shi', u'so', \
             u'Absent', u'Also', u'Nana', u'I', u'Or', u'So', u'Yo', u'']
wordcloud = WordCloud(font_path='/System/Library/Fonts/Hiragino Mincho ProN.ttc',width=480, height=300,background_color='white',stopwords=set(stop_words))
#Generate a word cloud from text.
wordcloud.generate(text)
#Save to a file.
wordcloud.to_file('wordcloud.png')

Here is what I made

wordcloud.png

The amount of articles is small, so it's scary ... Feeling that there are a lot of Python components. It also includes the IoT that was deeply taught.

bonus

Since it was only a tag earlier, I will take the text of my Qiita article and visualize it with Word Cloud. wordcloud2.png

There are some strange words in it, but I can understand what I mean. After all, there are a lot of components such as Python and machine learning. My interest is rather strong. You can see the tendency.

At the end

I tried using Word Cloud with Qiita as the subject, but it seems more interesting to take it from ordinary sentences such as Twitter and blogs. It would be interesting to try with lyrics, novels, and other sentences.

This is aki_suga! looking forward to!

Recommended Posts

I tried to summarize what was output with Qiita with Word cloud
I tried to output LLVM IR with Python
I tried to build ML Pipeline with Cloud Composer
I tried to summarize SparseMatrix
What I was addicted to with json.dumps in Python base64 encoding
I tried to summarize everyone's remarks on slack with wordcloud (Python)
I tried to summarize Python exception handling
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to get started with Hy
Python3 standard input I tried to summarize
I tried to summarize the graphical modeling.
I tried to implement CVAE with PyTorch
I tried to solve TSP with QAOA
What I was addicted to Python autorun
I tried to summarize Ansible modules-Linux edition
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to get the authentication code of Qiita API with Python.
Since it is the 20th anniversary of the formation, I tried to visualize the lyrics of Perfume with Word Cloud
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I tried to get CloudWatch data with Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried to predict Titanic survival with PyCaret
Qiita Job I tried to analyze the job offer
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
LeetCode I tried to summarize the simple ones
I tried to summarize the operations that are likely to be used with numpy-stl
I tried to learn the sin function with chainer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to create a table only with Django
I tried to extract features with SIFT of OpenCV
I tried to move Faster R-CNN quickly with pytorch
I tried to read and save automatically with VOICEROID2 2
I tried to summarize how to use matplotlib of python
I tried to implement and learn DCGAN with PyTorch
I tried to summarize the basic form of GPLVM
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
Scraping your Qiita articles to create a word cloud
When I tried to do socket communication with Raspberry Pi, the protocol was different
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to implement time series prediction with GBDT
pickle To read what was made in 2 series with 3 series
I tried to uncover our darkness with Chatwork API