I will play with morphological analysis.

Preface

When I was reading the review of KOTY [^ 1] to kill time the other day, suddenly ** I came up with the idea that it would be fun to analyze the morphological analysis of KOTY so far. I have never done morphological analysis, so I would like to do it as a study.

Download general review

For the time being, get a general comment on Nokogiri from KOTY deferred wiki. The ones posted are from 2005 to 2018. By the way, mobile phones and eroge will go through. [^ 2] The URL of the general review for each year is

https://koty.wiki/(Year)GC

Because it is unified in, you can easily get it. Of particular note is

2005-2008
2010-2011
2012-2018

Since the description method in the HTML source of the general comment is different in each of the three periods, I wonder if these three patterns were processed differently. Click here for the code to get with Nokogiri

`KOTY_Scrape.rb`


require 'nokogiri'
require 'open-uri'
if ! Dir::exist?("KOTY general comment") #Create folder for saving
    Dir::mkdir("KOTY general comment")
end
for year in 2005..2018 do #Initialize the text file in the folder
    File.open("KOTY general comment/#{year}Year.txt","w") do |text|
    end
end
for year in 2005..2009 do #Inside the p element in the body is the body, br is not sandwiched between p and p
    sleep 1
    doc = Nokogiri::HTML(URI.open("https://koty.wiki/#{year}GC"))
    doc.xpath("//div[@id='body']//p").each do |paragraph|
        File.open("KOTY general comment/#{year}Year.txt","a") do |text|
            text.puts paragraph.inner_text
        end
    end
end
for year in 2010..2011 do #There is no line break in the HTML source, only the br tag is used.
    sleep 1
    doc = Nokogiri::HTML(URI.open("https://koty.wiki/#{year}GC"))
    comment = doc.xpath("//p[@class='aapro']")
    comment.search('br').each do |br|
        br.replace("\n")
    end
    File.open("KOTY general comment/#{year}Year.txt","a") do |text|
        text.puts comment.inner_text
    end
end
for year in 2011..2018 do #Described in the blockquote element
    sleep 1
    doc = Nokogiri::HTML(URI.open("https://koty.wiki/#{year}GC"))
    File.open("KOTY general comment/#{year}Year.txt","a") do |text|
        text.puts doc.xpath("//blockquote").inner_text
    end
end

I got it now. Like this

Looking at it like this, it seems that the file size tends to increase with each passing year.

The problem here

Ah, I was originally planning to analyze morphological elements on Windows64 / Ruby / Mecab. ** "Win64" ** and ** "Ruby" ** and ** "Mecab" ** is ... It's very troublesome to build an environment.

No, I tried it, about twice I've read various existing articles, but ... ** The bad ones are bad ** That's why ...

** Use Python **

Environment

Well yes, build the environment Using Mecab on Python / Win64 is pretty easy and quick

Download the stray build installer for Win64 from here and install it in UTF-8.
Get the mecab library by referring to this article

only this You don't have to rewrite the DLL in hell, I'm even impressed, this is destructive, it's destroying the concept, it's a paradigm shift.

I played a lot

For the time being, let's morphologically analyze the overall review of 2018 and output it with WordCloud. I will go with the feeling of extracting only "nouns".

`MecabKOTY.py`


import MeCab
from wordcloud import WordCloud
t = MeCab.Tagger()

with open('KOTY general comment/2018.txt',encoding="UTF-8") as txt_file:
    text = txt_file.read()

nodes = t.parseToNode(text)
s = []

while nodes:
    if nodes.feature[:2] in ['noun']:
        s.append(nodes.surface)
    nodes = nodes.next

wc = WordCloud(width=720, height=480, background_color="black",stopwords=
    {"this", "For", "It", "Yo", "thing", "thing"}
    , font_path="C:\Windows\Fonts\HGRGE.TTC")
wc.generate(" ".join(s))
wc.to_file('KOTY_wc.png')

Ahh, it feels good, it feels good It's just like morphological analysis! !! !! If this was for 2007, for example ** "Scenario" ** stands out like this,

If it was from 2014 ** "Rider" ** words stand out. KOTY's general comment also has its own individuality every year.

Next, let's analyze all the reviews together.

`MecabKOTY.py`


import MeCab
from wordcloud import WordCloud
t = MeCab.Tagger()
s = []

for y in range(2005,2018):
    with open(f'KOTY general comment/{y}Year.txt',encoding="UTF-8") as txt_file:
        text = txt_file.read()
    nodes = t.parseToNode(text)
    while nodes:
        if nodes.feature[:2] == "noun":
            s.append(nodes.surface)
        nodes = nodes.next

wc = WordCloud(width=720, height=480, background_color="black",stopwords=
    {"this", "For", "It", "Yo", "thing", "thing"}
    , font_path="C:\Windows\Fonts\HGRGE.TTC")
wc.generate(" ".join(s))
wc.to_file('KOTY_wc.png')

It will be like this. It's a masterpiece. ** "Player" **, ** "Game" **, ** "Kusoge" ** It feels like a symbol of KOTY ~~~~~

Now, what if we narrow this down further and try to extract only proper nouns?

`MecabKOTY.py`


import MeCab
from wordcloud import WordCloud
t = MeCab.Tagger()
s = []

for y in range(2005,2018):
    with open(f'KOTY general comment/{y}Year.txt',encoding="UTF-8") as txt_file:
        text = txt_file.read()
    nodes = t.parseToNode(text)
    while nodes:
        if nodes.feature[:7] == "noun,固有noun":
            s.append(nodes.surface)
        nodes = nodes.next

wc = WordCloud(width=720, height=480, background_color="black"
    , font_path="C:\Windows\Fonts\HGRGE.TTC")
wc.generate(" ".join(s))
wc.to_file('KOTY_wc.png')

It's history. ** You can feel the history. ** **

By the way, I wondered if I could extract ** "Kusoge Maker" ** by narrowing it down further and limiting it to ** "organization name" **.

** "It's not an organization" ** I stopped because there was a lot of things mixed in. It might be cool if you change the dictionary.

Next, let's change the taste a little. ** Check "Changes in the frequency of specific words by year" **. The first thing to look for is ... well, let's make it a ** "bug" **. I will draw a line graph with matplotlib.

`KOTYPlot.py`


import MeCab
from wordcloud import WordCloud
import matplotlib.pyplot as plt
t = MeCab.Tagger()
c = []
for y in range(2005, 2018):
    c.append(0)
    with open(f'KOTY general comment/{y}Year.txt',encoding="UTF-8") as txt_file:
        text = txt_file.read()
    nodes = t.parseToNode(text)
    while nodes:
        if nodes.surface == "bug":
            c[-1] += 1
        nodes = nodes.next

plt.plot(range(2005, 2018), c, linewidth=4)
plt.xlabel("Year", fontsize = 24)
plt.ylabel("Occurrence:Bug", fontsize=24)
plt.grid(True)
plt.savefig("KOTYgraph.png ")

too scary. 2015 is going to be amazing. Probably the result of the collision of two big bug towers, "Ajinoko" and "Tetaru". Also, it's hidden in the impact, but the state of 2013 with no ** "bugs" is amazing. Certainly, in 2013, I think there was competition for shit in a different direction from bugs.

The next word to make a graph is ... ** "Year-end" **.

！？！？！？？！？？！？ I feel a lot of regularity! ?? !! ?? It's a roller coaster type, isn't it? It ’s a little to go up in an instant Feeling that it takes time I wonder if the appearance of "year-end monsters" has some periodicity.

Summary

it was fun

[^ 1]: 2/5 Channel's abbreviation for "Kusoge of the Year", the thread that decides "the most fucking game of the year". [^ 2]: The reason for using multiple sentences for one year is that it is a little off the point and that we are a minor.

[PYTHON] I tried morphological analysis of the general review of Kusoge of the Year

Preface

Download general review

KOTY_Scrape.rb

The problem here

Environment

I played a lot

MecabKOTY.py

MecabKOTY.py

MecabKOTY.py

KOTYPlot.py

Summary

`KOTY_Scrape.rb`

`MecabKOTY.py`

`MecabKOTY.py`

`MecabKOTY.py`

`KOTYPlot.py`