Sentiment analysis with Python (word2vec)

The other day, I participated in a Python study session sponsored by Team Zet Co., Ltd. The theme this time is "text emotions using word2vec" "Analysis". To be honest, it was a crazy theme for me who first touched Python a week ago, but I wonder if it's possible to experience how the grammar I'm studying is being put to good use. I rushed in and boarded one day before the event.

By the way, I will leave the introduction to this extent and get into the main subject.

What is Word2Vec in the first place? ?? ??

Neural network model (machine learning) that analyzes words. Simply put, it seems that words can be vectorized and weighted. (For more information, refer to here)

This time White Goat Corporation I used the word2vec model of.

How to install word2vec is here

First of all Make sure the words are vectorized by word2vec. Let's type this code with word2vec implemented.

sample.py


import gensim.models.word2vec.Word2Vec as wv 

print(len(model.wv["Love"]))
model.wv["Love"]

Then

50 array([ 0.09289702, -0.16302316, -0.08176763, -0.29827002, 0.05170078, 0.07736144, -0.06452437, 0.19822665, -0.11941547, -0.11159643, 0.03224859, 0.03042056, -0.09065174, -0.1677992 , -0.19054233, 0.10354111, 0.02630192, -0.06666993, -0.06296805, 0.00500843, 0.26934028, 0.05273635, 0.0192258 , 0.2924312 , -0.23919497, 0.02317964, -0.21278766, -0.01392282, 0.24962738, 0.11264788, 0.05772769, 0.20941015, -0.01239212, -0.1256235 , -0.19794041, 0.1267719 , -0.12306885, 0.01006295, 0.08548331, -0.08936502, -0.05429656, -0.09757583, 0.10338967, 0.13714872, 0.23966707, 0.02216845, 0.02270923, 0.32569838, -0.0311841 , -0.00150117], dtype=float32)

The result will be returned. This is the word "love" made up of 50 dimensions. It shows that the component is composed of the above elements.

next

sample.py


#Extract words similar to keyword
sim_do = model.wv.most_similar(positive=["Girlfriend"], topn=30)
#Since it is listed, it is shaped for easy viewing
print(*[" ".join([v, str("{:.5f}".format(s))]) for v, s in sim_do], sep="\n")

When you hit Herself 0.82959 Molly 0.82547 He 0.82406 Sylvia 0.80452 Charlie 0.80336 Lover 0.80197 You can extract words with similar meanings such as. The number to the right of the word is a quantification of how much you are with the word "she".

Also, when you want to know how long the two words are

similarity = model.wv.similarity(w1="Apple", w2="Strawberry")
print(similarity)

similarity = model.wv.similarity(w1="Apple", w2="Aomori")
print(similarity)

similarity = model.wv.similarity(w1="Apple", w2="Anpanman")
print(similarity)

Then 0.79041845 0.30861858 0.45321244 Will be returned. We quantified how similar the words w1 and w2 are. If you say apples, Aomori! I think that many people associate it with, but since I have decided that Anpanman is more similar than Aomori, I understand that this model is not perfect yet.

Well, here

"King"-"Man" + "Woman" = "Queen" ???

I will consider the famous proposition.

sample.py


sim_do = model.wv.most_similar(positive = ["King", "Female"], negative=["male"], topn=5)
print(*[" ".join([v, str("{:.5f}".format(s))]) for v, s in sim_do], sep="\n")

#Words in positive compare the degree of similarity, words in negative compare the degree of dissimilarity

Result is…

Princess 0.85313 Bride 0.83918 Beast 0.83155 Witch 0.82982 Maiden 0.82356

I got a similar answer, though it didn't exactly match the "queen".

By the way, we have compared only words so far, but it is also possible to quantify what kind of emotions a sentence contains.

sample.py


import numpy as np
t = Tokenizer()
s = '
# Enter your favorite sentences.
'
output_data=[]
x = np.empty((0,4), float)
for token in t.tokenize(s):
  if token.part_of_speech.split(',')[0]=="noun" or token.part_of_speech.split(',')[0]=="adjective":
    print(token.surface)
    similarity1 = model.wv.similarity(w1=token.surface, w2="happy")
    #print("joy:{0}".format(similarity1))
    similarity2 = model.wv.similarity(w1=token.surface, w2="pleasant")
    #print("sorrow:{0}".format(similarity2))
    similarity3 = model.wv.similarity(w1=token.surface, w2="sad")
    #print("anxiety:{0}".format(similarity3))
    similarity4 = model.wv.similarity(w1=token.surface, w2="excitement")
    #print("Interest:{0}".format(similarity4))
    x = np.append(x, np.array([[similarity1, similarity2, similarity3, similarity4]]), axis=0)

print("-"*30)
print(np.mean(x, axis=0))
print("Happy:{0}".format(np.mean(x, axis=0)[0]))
print("easy:{0}".format(np.mean(x, axis=0)[1]))
print("Sadness:{0}".format(np.mean(x, axis=0)[2]))
print("Xing:{0}".format(np.mean(x, axis=0)[3]))



Enter your favorite sentence in the variable s As an example "I proposed at a restaurant with a view of the night view." Let's put in a romantic sentence Result is

Night view Restaurant propose

[0.29473324 0.44027831 0.27123818 0.20060815]

Happy: 0.29473323623339337 Easy: 0.4402783115704854 Sad: 0.27123818174004555 Xing: 0.20060815351704755

Will come out. So this system "I proposed at a restaurant with a view of the night view." Is judged to be a "fun" sentence. (The larger the number, the stronger the emotion)

Then another example "A pistol murder occurred in a prison at midnight." Let's put in a very negative aura punpun sentence Then

Midnight prison Handgun murder Incident

[-0.00661952 0.01671012 0.12141706 0.23172273] Happy: -0.0006619524117559195 Fun: 0.01671011543367058 Sad: 0.12141705807298422 Excitement: 0.2317227303981781

As a result, In fact, the value may take a negative value. Certainly, I don't feel happy even a millimeter.

Impressions

It's a wonderful time to be able to analyze the sentiment of sentences so easily. I am deeply grateful to Team Zet for giving me such a useful learning place.

Recommended Posts

Sentiment analysis with Python (word2vec)
Data analysis with python 2
Voice analysis with python
Voice analysis with python
Data analysis with Python
[Python] Morphological analysis with MeCab
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Planar skeleton analysis with Python
Japanese morphological analysis with Python
Muscle jerk analysis with Python
Text sentiment analysis with ML-Ask
Python2 + word2vec
Impedance analysis (EIS) with python [impedance.py]
Text mining with Python ① Morphological analysis
Data analysis starting with python (data visualization 1)
Logistic regression analysis Self-made with python
Data analysis starting with python (data visualization 2)
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
FizzBuzz with Python3
Scraping with Python
Statistics with python
[In-Database Python Analysis Tutorial with SQL Server 2017]
Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)
Two-dimensional saturated-unsaturated osmotic flow analysis with Python
Scraping with Python
Data analysis python
Word2Vec with BoUoW
Machine learning with python (2) Simple regression analysis
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
Sentiment analysis of tweets with deep learning
with syntax (Python)
Tweet analysis with Python, Mecab and CaboCha
Principal component analysis with Power BI + Python
Bingo with python
Zundokokiyoshi with python
Data analysis starting with python (data preprocessing-machine learning)
Two-dimensional unsteady heat conduction analysis with Python
Python: Simplified morphological analysis with regular expressions
Excel with Python
Microcomputer with Python
Cast with python
[Python] I introduced Word2Vec and played with it.
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]
Medical image analysis with Python 1 (Read MRI image with SimpleITK)
Use Python and word2vec (learned) with Azure Databricks
Static analysis of Python code with GitLab CI
Easy Lasso regression analysis with Python (no theory)
Two-dimensional elastic skeleton geometric nonlinear analysis with Python
Serial communication with Python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Python: Time Series Analysis
Scraping with Python (preparation)
Try scraping with Python.