Coronavirus pandemic and dark news these days ...: severe: → If you only see the bright news, you should feel positive!
A Chrome extension that makes Google search results harder to see the darker the content. Chrome Store: Opty github
Dark site | Bright site |
---|---|
Just install the Chrome extension from here and search! Reference: How to install the extension
Personnel: 2 university students (Tomohiro Inoue, Takeshi Watanabe) Production period: 1 day Cloud Functions Python 3.7 JavaScript MeCab
Break down a sentence into words and judge whether each element is bright or dark.
MeCab is used to decompose sentences into words. For example
morph.py
import MeCab
tagger = MeCab.Tagger()
result = tagger.parse('The new coronavirus is outbreak worldwide.')
print(result)
When you execute the process
New noun,General,*,*,*,*,New model,Singata,Singata
Corona noun,General,*,*,*,*,corona,corona,corona
Virus noun,General,*,*,*,*,virus,virus,virus
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
World noun,General,*,*,*,*,world,Sekai,Sekai
Noun,suffix,Adjectival noun stem,*,*,*,Target,Text,Text
Particles,Adverbization,*,*,*,*,To,D,D
Large prefix,Noun connection,*,*,*,*,Big,Die,Die
Trendy noun,Change connection,*,*,*,*,trend,Ryukou,Ryuko
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
Verb,Non-independent,*,*,One step,Continuous form,Is,I,I
Auxiliary verb,*,*,*,Special / mass,Uninflected word,Masu,trout,trout
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
It is disassembled like. Extract the unused form of each word from the decomposed result and use it.
[Japanese evaluation polarity dictionary](http://www.cl.ecei.tohoku.ac.jp/index.php?Open] published by Inui Suzuki Laboratory of Tohoku University for determining the brightness of elements % 20Resources% 2FJapanese% 20Sentiment% 20Polarity% 20Dictionary) has been used.
In this dictionary, Japanese words are classified into positive (bright) or negative (dark), and nouns are classified into three levels: p (positive), n (negative), and e (neither).
wago.121808.pn
Negative (experience)
Negative (experience) give up
Negative (experience) Akiruno
Negative (experience)
Negative (experience)
pn.csv.m3.120408.trim
Thank you p ~ There is / enhances (existence / nature)
Thank you p ~ There is / enhances (existence / nature)
Thank you annoyance n ~ becomes / becomes (evaluation / emotion) subjective
Being e ~ as it is (evaluation / emotion) Subjectivity
Being e ~ as it is (evaluation / emotion) Subjectivity
Each component of the sentence is replaced with 1 if it is positive, -1 if it is negative, and 0 otherwise, and the average is taken as the brightness of the sentence.
Below are the points I stumbled upon during development.
I was easily thinking that I should divide the sentence into words and search the polarity dictionary with the word as the key, but it was not that simple. In the polarity dictionary, not only a single word such as "good" but also an element consisting of two or more words (in this case, "good" + "not") such as "not good" are registered. Therefore, I reorganized the dictionary so that it can be searched by elements consisting of multiple words, and made it pickle before using it.
main.py
#Store in dictionary
for line in pn_noun_file:
line = line.replace('\n', '').split('\t')
if line[1] == 'e': #Ignore lines that are neither positive nor negative
continue
#A list of words registered in the polarity dictionary converted into basic forms
basic_form = convert_to_basic_form(line[0])
#Ignore lines for which the basic form cannot be obtained and lines for which the basic form is one character
if not basic_form:
continue
elif len(basic_form) == 1 and len(basic_form[0]) == 1:
continue
key = basic_form[0]
if key not in pn_dict:
pn_dict[key] = {}
#Stored as a combination of brightness and a combination of basic shapes
pn_dict[key][(',').join(basic_form)] = 1 if line[1] == 'p' else - 1
While "bad" is registered as a negative element in the polarity dictionary, "not fun" is not registered, so it is judged as a positive word by reacting only to the "fun" part. It was. Therefore, if there is no such thing, the brightness value of the previous part is inverted.
main.py
#PN judgment. Returns the average PN value of the element being requested.
def calc_pn(basic_form):
pn_dict = pickle.load(open('pn.pkl', 'rb'))
pn_values = [] #Stores the PN judgment value of each element in the text
while basic_form:
pn_value = 0
del_num = 1 #Number to remove from list
beginning = basic_form[0] #Set the first word to key
if beginning in pn_dict:
for index, word in enumerate(basic_form):
if word == "。" or word == "、": #If the sentence breaks, stop
break
if index == 0:
joined_basic_forms = beginning
else:
joined_basic_forms += ',' + word
if word == "Absent" and del_num == index: #Positive negative reversal required
print('reverse')
pn_value *= -1
del_num = index + 1
if joined_basic_forms in pn_dict[beginning]:
pn_value = pn_dict[beginning][joined_basic_forms]
del_num = index + 1
pn_values.append(pn_value)
del basic_form[0:del_num]
return sum(pn_values) / len(pn_values)
--Slow: Currently, it takes about 3 seconds from the search result display to the style reflection. --Bright page decoration: I want to make it stand out the brighter it is.
I had some time to refrain from corona, so I made it as a study during the spring break. Others are under development! If you like, please follow LGTM and Twitter!
Extension: Opty Twitter: Tomohiro Inoue, Takeshi Watanabe