[PYTHON] That's why I'll look up tweets from the House of Representatives election

It's been a year since the ban on online elections was lifted, and every time the elections are over, it's a speech space that keeps reducing the remaining machines of democracy and Japan like a spelunker. How are you doing today?

By the way, this time I will examine the contents of the tweets before and after the ballot counting.

Collection contents

2014/12/14 From 18:00 to 07:00 Collect tweets containing the following information

#election,#House of Representatives election,election

** Keep getting tweets containing specific keywords using Streaming API in Python ** http://qiita.com/mima_ita/items/ecdf7de2fe619378beee

Code used for collection and analysis

https://github.com/mima3/stream_twitter

Confirmed to work on Windows7 Python 2.7

Analysis of collection results

The collected data can be downloaded from the following. http://needtec.sakura.ne.jp/doc/shuin47twitter.zip

Histogram by time

First, let's look at the number of tweets by time. Let's look at the hourly data from 18:00 to 07:00 on 2014/12/14 using the above code.

python twitter_db_hist.py "2014/12/14 9:00" "2014/12/14 22:00" 3600

The result is as follows:

figure_1.png

Time (UTC) Japan time number
12/14 09:00 12/14 18:00 3149
12/14 10:00 12/14 19:00 4047
12/14 11:00 12/14 20:00 11280
12/14 12:00 12/14 21:00 9755
12/14 13:00 12/14 22:00 7199
12/14 14:00 12/14 23:00 5207
12/14 15:00 12/14 00:00 3472
12/14 16:00 12/14 01:00 3801
12/14 17:00 12/14 02:00 1545
12/14 18:00 12/14 03:00 529
12/14 19:00 12/14 04:00 292
12/14 20:00 12/14 05:00 300
12/14 21:00 12/14 06:00 477

The timing of ballot counting at 20:00 is the highest. And the number of tweets will decrease over time. However, it revived in the 1:00s, and after that, the number of tweets decreased, and it recovered a little from around 5:00 when I woke up.

It's understandable that the number of tweets decreases at midnight and increases in the morning. But why did the number of tweets increase in the midnight 1:00s?

For this reason, let's take a look at the 1:00s in 1-minute units.

python twitter_db_hist.py "2014/12/14 16:00" "2014/12/14 17:00" 60

figure_2.png

Looking at this result, it shows a rapid excitement around 1:27 minutes.

What happened at this time? Here, let's check the threads of the people of the Kaieda Research Institute who love the Democratic Party.

** [Fukatsu no Jumon is different] Kaieda Democratic Party Research 802th [Tosen no Sho has been decided] ** http://anago.2ch.net/test/read.cgi/asia/1418565521/

811: Nameless place of sunrise: 2014/12/15(Month) 01:26:44.86 ID:tG+ZZ8gB
[Banri Kaieda] Democratic Party representative Banri Kaieda did not revive in proportional Tokyo block, and the election was confirmed (01):19)(c)2ch.net
    http://daily.2ch.net/test/read.cgi/newsplus/1418574054/

812: Nameless place of sunrise: 2014/12/15(Month) 01:26:49.76 ID:4Us97nfn
Lost decision w

813: Nameless place of sunrise: 2014/12/15(Month) 01:26:51.11 ID:pW7uplw3
Goodbye, Mari

814: Nameless place of sunrise: 2014/12/15(Month) 01:27:01.20 ID:yIjazH47
Wow ah ah NHK also lost w

815: Nameless place of sunrise: 2014/12/15(Month) 01:27:02.02 ID:NOhUWn58
Mali completely defeated at NHK

No, I'm going to Phoenix from here! It's definitely Fenix, so _____

816: Nameless place of sunrise: 2014/12/15(Month) 01:27:08.70 ID:4zmUGrZE
    >>802
㌧. I haven't bought any snacks so I can open the mackerel can w

Looking at the situation at that time, it seems that Asahi reported the extraordinary defeat of Kaieda at 1:19, and NHK also reported the news at 1:27.

As expected, it can be said that the advancement and retreat of the leader of the first opposition party had the impact of wiping out the drowsiness of Twitter at midnight. The results also show that television has stronger diffusion power than newspaper extras.

Extraction of frequent words

Next, let's look at frequent words. By using Mecab, we performed morphological analysis and aggregated the words.

This can be done with the following script.

python twitter_db_mecab.py "2014/12/14 9:00" "2014/12/14 22:00" > mecab.txt

The best 100 are displayed below.

word Number of appearances
election 70626
Ward 33315
Selection 27196
House of Representatives 27152
Voting 13740
11698
Probably 8386
Liberal Democratic Party 7403
Breaking news 7120
Mr 7074
Tokyo 6864
Vote counting 6484
Winning 6456
Raw 6443
NHK 6222
5866
# 5519
Lost 5504
official 5488
kyodo 5487
Sure 5384
5352
party 5236
Extra 5229
Seat 5025
go 4811
Man 4796
BqAAr 4633
vlhS 4606
rate 4460
Proportional 4419
Liberal Democratic Party 4302
block 4208
4196
Teru 4035
Day 3912
Candidate 3811
House of Representatives 3782
seiji 3773
3745
During ~ 3726
Japan 3611
jimin 3607
koho 3603
representative 3599
Democracy 3592
3589
Person 3548
nicohou 3490
JNSC 3203
blogos 3170
ld 3125
name 3098
Abe 3068
Democratic Party 3039
To tell 3011
Special number 2959
Next generation 2889
2881
Time 2844
Nico 2817
Be 2750
Appearance 2727
Beat Takeshi 2723
To be 2721
Kanagawa 2690
Politics 2532
2511
Kaieda 2504
it can 2488
Minutes 2424
Long 2371
Year 2315
Viewing 2315
Okinawa 2231
Resurrection 2176
Up 2092
U 1997
Acquired 1977
Vote 1954
Absent 1953
Current 1926
Restoration 1905
Prime Minister 1898
Press 1888
Report 1882
pond 1831
take 1775
Communist Party 1773
Substitute 1769
necessary 1766
nMDR 1761
YidT 1761
Youth 1750
Target 1727
Paper 1712
Mari 1666
senkyost 1645
information 1628
I'd love to 1618

After all, the most extracted party name was the "Liberal Democratic Party," which took the majority. Next is the "Democratic Party," followed by the "Next Generation," followed by the "Renewal" and the "Communist Party." The next-generation party seems to have a considerable gap between the actual number of seats and the degree of attention on the Internet.

"Tokyo" and "Okinawa" were extracted by place name. As for Tokyo, there was a retweet of the article of "Tokyo Shimbun", so it was extracted a lot, and for Okinawa, the LDP was in the form of annihilation of the single-seat constituencies, so it is thought that it attracted more attention than other areas.

The names of people that attracted attention were "Abe," "Beat Takeshi," and "Kaieda." Not to mention the prime minister and the leader of the first opposition party, it was surprising that "Beat Takeshi" was noticed. Apparently this is because "Beat Takeshi" was appearing on Nico Nico Live.

Exploring the relationship between clauses

Finally, let's use Cabocha to tabulate the relationship between clauses. See below for how to install Cabocha on Windows.

** Put Cabocha in Windows and analyze the dependency with Python ** http://qiita.com/mima_ita/items/161cd869648edb30627b

This time, I analyzed with 0.66. I think the latest results will be similar.

This can be done with the following script.

python twitter_db_cabocha.py "2014/12/14 9:00" "2014/12/14 22:00"  > cabocha.txt

The best 100 are displayed below.

Clause 1 Clause 2 Number of appearances
Lost Sure 1762
co/ 4nMDR4YidT#General election http://t 1557
Turnout 0% 1538
[Election] House of Representatives election, Teen 1534
Teen Turnout 1534
Youth go 1504
name write 1504
RT@whsaito:Ballot Fill out 1502
name Fill out 1502
By all means candidate name 1502
method take 1502
high Japan 1502
write go 1502
take Japan 1502
Fill out method 1502
14th go 1502
Education level high 1502
RT@kyoho_times: Teen 1460
Resurrection Sure 1288
guy Win 1208
Such guy 1208
go-denial Win 1186
Probably Report 1172
3700kei:#General election election go-denial 1141
RT@keisei 3700kei:#General election election 1107
Proportional Tokyo block Resurrection 1075
RT@kyodo_official:Democratic Party Banri Kaieda representative 964
Chairman Tetsu Katayama Lose 928
Lose Lose 928
Banri Kaieda representative Resurrection 928
Opposition Lose 928
Socialist Party Chairman Tetsu Katayama 928
1949 House of Representatives election Lose 928
defeat Sure 914
Everyone Politics http://t 885
Winning Sure 815
House of Representatives election special page →http://t 761
feel #election 755
Polling place listen 755
1 vote Disparity 755
Disparity feel 754
listen feel 754
Girls high school listen 754
RT@kurosia:acquaintance Polling place 752
The lowest after the war Last time 742
RT@ld_blogos: [Breaking news] 663
Next generation party 633
[Breaking news] Probably 561
Below http://t 551
RT@kyodo_official:Next generation party 546
House of Representatives election Turnout 535
When State 514
Kiyomi Tsujimoto, Democracy, Osaka 10th District Probably 475
Candidate information House of Representatives election 2014-Yahoo
afternoon As of 6 o'clock 424
Turnout 34 424
79 points Below 420
National average 34 420
As of 6 o'clock 34 420
98% Last time 420
By 34 420
RT@senkyost: [Acquired seats___ 386
Voting go 379
Right of collective self-defense Exercise acceptance 377
defeat Report 370
Probably Break 367
thing know 363
Japan know 360
Return to J League know 359
If this happens Run 359
gradually Return to J League 359
I Run 359
necessary To tell 357
Winner#Hope to spread___#RT Follow everyone 356
[Sad news] For anime necessary 356
Regulation necessary 356
Winner necessary 356
To tell #Election http://t 355
Follow everyone #Election http://t 355
8bu_: necessary 352
RT@K 8bu_: 352
#election#NHK#衆議院election#Ikegami election#Vote counting 344
Mr. Ishihara Make a statement 341
party Shintaro Ishihara Chief Advisor 341
House of Representatives election this time Retired from politics 341
Shintaro Ishihara Chief Advisor Lost 341
Retired from politics Make a statement 341
co/ 7LGbX1z 322
RT@mainichijpedit:Ministry of Internal Affairs and Communications By 309
Understanding obtain 304
___http ://t 303
People Understanding 303
RT@jimin_koho: /To do 301
thing Sure 278
RT@jimin_koho: / 270
Exercise acceptance To express 268
Liberal Democratic Party To express 268
To express Seiichiro Murakami 268
Opposition To express 268
2nd ward To express 268

Lost → As certainty is ranked first, Twitter seems to be more interested in who will be dropped than who will be accepted.

Also, it seems that there are many references to youth turnout. However, it is also true that there are many news stories that "the turnout of teens is 0%".

"High"-> "Japan" seems to be the result of a large number of tweets saying "Only Japan with a high level of education uses the method of writing names on ballots."

Also, as I mentioned at the beginning, I investigated democracy and Japan, where the remaining opportunities are reduced at each election. Of the two tweets saying that democracy will die, there are 11 tweets saying that democracy will not die, so it seems that democracy's remaining opportunities are not less than expected.

However, a few words were extracted to reduce the remaining machines, such as the following.

Clause 1 Clause 2 Number of appearances
Democracy die 2
Democracy End 2
Democracy End 2
Democracy Collapse 2
···Democracy Collapse 1

How the reduction of the remaining machines of Japan is as follows.

Clause 1 Clause 2 Number of appearances
RT@inosan08260:End of Japan confirmed 178kakapo:Japan 7
Liberal Democratic Party Japan collapse 4
Already Japan collapse 4
Laughable Japan collapse 4
Japan Crush 2
Japan End 2

From this result, it seems that the number of remaining democratic aircraft has decreased to 9 in this election, and the number of remaining aircraft of Japan has decreased by about 23.

Summary of what can be read from the data

・ When Mr. Kaieda becomes unemployed, he gets excited even though it is midnight, and when he sees that the phrase of defeat → certainty appears frequently, it is more noticeable who will fall than who will accept it.

・ Looking at the number of appearances of the word “next generation” and the actual results, it can be said that seats cannot be taken because of the attention on the Internet.

・ Democracy and Japan had the image that the number of remaining aircraft is decreasing every election, but it seems that this was not the case.

Real summary

You can do a bogus analysis like that. For tweets by hour, as in this example, I think you should look at the increase and decrease of the data and investigate in detail where there was a change.

It's easy to see that the word frequency is certainly getting a lot of attention. However, it should be noted that a large number of extracts for this example is not always a positive reaction.

With regard to parsing, it may be possible to overcome the weaknesses of examining frequent word-only occurrences. But honestly, I wouldn't have fully realized that possibility this time around. This will be an issue for the future.

By the way, I couldn't get the tweet of the key mark with the filter of Streaming API.

Recommended Posts

That's why I'll look up tweets from the House of Representatives election
That's why I'll analyze the comments from the House of Representatives election Nico Nama's party leader debate.
That's why I calculate the number of seats for the proportional representation in the lower house election
That's why I analyze the homepages of each political party
I'll get rid of every single imperial system from the world!