[PYTHON] I checked the image of Science University on Twitter with Word2Vec.

To do

--Get Tweets with Twitter API --Add words to Mecab --Use Word2Vec

Get Tweets with Twitter API

I used the original ruby script.

# gem install twitter
require "twitter"

client = Twitter::REST::Client.new do |config|
  config.consumer_key        = ""
  config.consumer_secret     = ""
  config.access_token        = ""
  config.access_token_secret = ""
end

@result = client.search("Science University").take(10000)

File.open("tus.csv", 'w') do |file|
  @result.each do |tweet|
    file.write(tweet.text.gsub(/(\s)/,""))
    file.write("\n")
  end
end

You will have a csv file. There are many articles about getting tokens, so I will omit it.

Add words to Mecab


cd /usr/local/lib/mecab/dic
mkdir userdic
cd userdic
touch tus.csv
echo 'Science University,,,1,noun,General,*,*,*,*,Rikadai,Rikadai,Rikadai' >> tus.csv

#compile
/usr/local/Cellar/mecab/0.996/libexec/mecab/mecab-dict-index \
-d /usr/local/lib/mecab/dic/ipadic \
-u tus.dic \
-f utf-8 \
-t utf-8 tus.csv

#Press Enter to compile

reading tus.csv ... 1
emitting double-array: 100% |###########################################| 

done!

#pass through path

vi /usr/local/etc/mecabrc
#Specify where the generated dic is
userdic = /usr/local/lib/mecab/dic/userdic/tus.dic

word2vec

# coding: UTF-8
import pandas as pd
import numpy as np
import MeCab

tweets = pd.read_csv('/Users/Hiroto/git/scripts/tus.csv').tweet

#Create a word-separated file
wakati = ""
for tweet in tweets:
    mt = MeCab.Tagger("-Owakati")
    wakati = wakati + mt.parse(tweet)

f = open('tus_wakati.txt', 'w')
f.write(wakati)
f.close()

# word2vec
from gensim.models import word2vec
data = word2vec.Text8Corpus('tus_wakati.txt')
model = word2vec.Word2Vec(data, size=100)

Similarity of the subject

out=model.most_similar(positive=[u'Science University'],topn= 100)
for x in out:
    print(x[0],x[1])

word	Degree of similarity
Ne	0.9801737666130066
U	0.9679325222969055
world	0.9637500643730164
inequality	0.9604602456092834
Yeah	0.9603763818740845
So	0.9602923393249512
is	0.9574853181838989
That kind of	0.9568058252334595
Lol	0.9534944295883179
darkness	0.9462004899978638
！	0.9435620307922363
？	0.9433774948120117
Raw	0.942541241645813
From	0.9420970678329468
Good	0.9348764419555664
Yo	0.9348678588867188
。	0.9291704893112183
Feeling	0.929074764251709
Me	0.9288586378097534
together	0.9273968935012817
Twitter	0.9265207052230835
Is	0.9249017238616943
Secret meeting	0.9227114915847778
Teru	0.9216452836990356
To go	0.9207674264907837
God	0.9192628264427185
Good luck	0.918117880821228
Ah ~	0.9180813431739807
Disagreeable	0.9164369106292725
reason	0.9164099097251892
Waka	0.9158462882041931
Understood	0.915264368057251
)	0.913904070854187
Is	0.9111155867576599
Delicious	0.9105844497680664
Nana	0.9098367691040039
Man	0.909660816192627
Shit	0.9095121622085571
so	0.907973051071167
If	0.906628429889679
meaning	0.9065468311309814
Sophia	0.905195415019989
Or	0.9034873247146606
Guy	0.9014643430709839
Go	0.8999437689781189
What	0.8993074893951416
Drink	0.8984052538871765
march	0.8983776569366455
Say	0.8976813554763794
Ta	0.8964160680770874
Often	0.896243691444397
eat	0.8960259556770325
want to see	0.8957585096359253
Child	0.8946411609649658
nice to meet you	0.8943185806274414
Want	0.8941484689712524
Stunning	0.893967866897583
zebra	0.8935203552246094
Too	0.8934850692749023
you	0.8934849500656128
illumination	0.8927890062332153
go	0.8927274942398071
Ichi	0.8926646709442139
Is	0.8919773697853088
arithmetic	0.8915943503379822
(	0.8915064930915833
why	0.8907312154769897
Humanities	0.8906354904174805
Hmm	0.8897289037704468
-	0.8896894454956055
Yeah	0.8896220922470093
Department	0.8895649313926697
K	0.8881763219833374
Thoughts	0.8881138563156128
I don't know	0.8880779147148132
school	0.8879990577697754
But	0.8878818154335022
Incident	0.8878498077392578
Please	0.8875197172164917
Know	0.8871732354164124
Iwa	0.8870071172714233
Personality	0.8869134187698364
Hey	0.8867558240890503
Soukei	0.8866025805473328
I'd love to	0.8860080242156982
I wonder	0.8857483267784119
But	0.8853344321250916
Stop	0.8850265145301819
age	0.8849031925201416
k	0.884624719619751
which one	0.8840593695640564
Or	0.8840340971946716
Live	0.883965253829956
Discount	0.8836942911148071
By all means	0.8836302757263184
Crying	0.8831743597984314
yumalaonvae	0.883036196231842
o	0.8830046653747559
Note	0.8829131126403809
why	0.8827589154243469

** Inequality **, ** Darkness ** is like science What are "secret meetings" and "zebras" ...

Summary

--It's not working well because you haven't removed the trash from your tweets (maybe) --The number of tweets acquired is small (1696 tweets this time) --Even if you use the ruby code for 10,000, you can only get 1696 tweets. ――I wanted you to come out with "proffesional" or "unit"