[PYTHON] I tried to analyze the whole novel "Weathering with You" ☔️

752B7BFA-4985-442D-904C-BEB091269C6C.gif

1. A brief overview

In this article, I will explain how to process the text of the novel ** "Weathering with You" ** </ font> in natural language and perform sentiment analysis!

In general, ** sentiment analysis ** refers to discovering and quantifying ** "emotions" ** contained in a sentence and judging the opinion of the sentence. This is an area that is currently attracting attention because it allows users to mechanically classify their opinions about their products and services.

On the other hand "Isn't it possible to use sentiment analysis in addition to reviews and word-of-mouth?" So, in this article, I decided to challenge the sentiment analysis based on ** "Novel" ** </ font>, which is rarely done in the streets.

The purpose of this article is to analyze the emotions of the novel, ** "Isn't it possible to infer the rough development of the story and the character of the characters? **".

For example, in a story ・ If the ups and downs of emotional values are intense, it is a very dramatic development. ・ If you find a turning point from positive </ font> to negative </ font> based on your emotional value, you can objectively discover the origin and transition of the story. it can And so on.

And the subject I chose this time is Weathering with You </ font>! It is a big hit following the previous work ** "Your Name." **, and many people may have seen the movie.

image.png [Novel Weather Child (Kakugawa Bunko)-amazon](https://www.amazon.co.jp/%E5%B0%8F%E8%AA%AC-%E5%A4%A9%E6%B0%97 % E3% 81% AE% E5% AD% 90-% E8% A7% 92% E5% B7% 9D% E6% 96% 87% E5% BA% AB-% E6% 96% B0% E6% B5% B7 -Quoted from% E8% AA% A0 / dp / 4041026407)

I think it's still new to my memory ~~ (I have forgotten it already) ~~, so I think that those who went to the theater can enjoy watching it while remembering the movie scene.

The difference between a movie and a novel

――In the novel, the story progresses while the viewpoints (first person) of the characters change. ――The main viewpoint is the viewpoint of the main character, Hodaka, but there is also a chapter on Hina and Natsumi. ――The story is basically the same. The novel has more detailed explanations.

It's a personal impression, but you can enjoy the story from a slightly different perspective than the movie.

2. How to analyze emotions

This time, we performed the simplest sentiment analysis ** "Positive Negative Analysis" **. ** "Positive / Negative Analysis" ** means whether the text is a positive </ font> opinion, a negative </ font> opinion, or It is a classification method that judges whether it is neutral or not from a series of words. First, I will explain the general flow of sentiment analysis with a simple example. First, the sentence is morphologically analyzed and decomposed into morphemes (words) as follows.

** "Hatto Natsumi raises her hand energetically, and Suga ignores it. 』**

** ['Ha I',' and',' Natsumi',' San',' ga',' Genki',' to',' hand',' to','raise',',',' Suga' ,'San',' is','it',' to','ignore','do'] **

Then, each word is judged as positive </ font> or negative </ font>, and an emotion value is given to each.

** ['Hai', 0], ['and', 0], ['Natsumi', 0], ['san', 0], ['ga', 0], ['Genki', 1], ['To', 0], ['Hand', 0], ['Raise', 0], ['Raise', 0], [',', 0], ['Suga', 0], [' San', 0], ['is', 0], ['it', 0], ['is', 0], ['ignore', -1], ['do', 0] ​**

In this sentence, to energetic and neglect with emotional polarity, Genki: +1 Ignore: -1 Emotional value was given.

Finally, the total value is calculated to calculate the sentence emotion value. In the case of the above sentence, 1 + (-1) means that the emotion value is 0. In this way, the emotional value is given for each sentence.

An emotion dictionary is used to determine the positive / negative degree of a word. An emotion dictionary is a dictionary in which which words are positive </ font> or negative </ font> are written in advance as shown below. is. In this dictionary, if a word containing a negative (evaluation) comes, the emotion value is given -1, and if a positive (evaluation) comes, it is given a +1.

Emotional values are calculated based on this dictionary. スクリーンショット 2019-12-15 16.43.49.png

In addition, as shown below, the specifications allow positive / negative judgment using multiple words. [Mouth corner + rise, +1] [Voice + pop, +1] [Energetic + not, -1] [Danger + drowsy, -1]

3. Creating a corpus

First, create the original corpus. I made it myself like this with the help of a friend. ↓ スクリーンショット 2019-12-19 10.09.57.png ** * Due to copyright reasons, the corpus cannot be published, so only a part of it will be introduced **

A word from Mr. S, a friend and creator of the corpus This time, I undertook the transcription with the intuition that "I don't know, but it's definitely an interesting one." It takes a whole week to copy and divide several sentences using Kindle's "Notes and Highlights". To be honest, it was very difficult. You can copy and paste pages and weather while dragging sentences, but if you do this while you are exhausted from copying and pasting work, it feels like a rally ho. At the time of delivery, I was told "Thank you for the corpus!", But I confess that I was secretly googled because I didn't understand the "corpus".

4. Prepare an emotion dictionary

As mentioned above, prepare for sentiment analysis. This time, Tohoku University's Inui-Okazaki Laboratory is open to the public ["Japanese Evaluation Polar Dictionary"](http://www.cl.ecei.tohoku.ac.jp/index.php?Open%20Resources% 2FJapanese% 20Sentiment%20Polarity%20Dictionary) will be partially reorganized and used to suit the contents of the weather child. I made my own ** "multi-word dictionary" ** for positive / negative judgment by multiple words such as [Mouth angle + up, +1] introduced earlier.

5. Give emotional values for each sentence

Import the emotion dictionary and output the emotion value for each sentence. The code is given at the end of this article, but if you want a quick sentiment analysis in Japanese, the library ** "oseti" ** is very useful. I also used it as a reference.

Sentiment Analysis library oseti for Python using Japanese evaluation polarity dictionary has been released

I created a script that returns an emotion value as a return value when text is input, and added the emotion value to the data frame of the corpus as shown below. スクリーンショット 2019-12-21 17.47.05.png

Column description

total_word_score_pair_list_abs1: A list of morphemes with emotional polarity values and their emotional values sum_positive_srore: sum of positive values sum_pegative_srore: Sum of negative values new_srore_sum: The sum of positive and negative values

6. Graph the transition of emotional values and compare with the content of the novel

Let's graph the emotion value using the visualization library seaborn. Sum the emotional values for each page and look at the ** transition in chronological order **. The x-axis is the number of pages and the y-axis is the emotion value. ** * Maybe spoilers are included from here. be careful. </ font> **

Changes in emotional values on a page-by-page basis


import matplotlib.pyplot as plt
from statistics import mean, median
from matplotlib import pyplot as plt
import seaborn as sns; sns.set()
import re
%matplotlib inline

page_sum_df = df_tenki2.groupby("page_num").new_score_sum.sum().reset_index()
sns.lineplot(x="page_num", y="new_score_sum", data=page_sum_df)

Here are the results! スクリーンショット 2019-12-18 22.39.29.png

The emotional ups and downs are quite intense! Positive </ font> and Negative </ font> alternate. What can be read from this graph --Positive and negative appear clearly as undulations on each page --There are pages with extremely high positive and negative values ――The emotional value is changing (maybe) depending on the story and development. However, the undulations are very fine in the graph, and the features are a little difficult to understand ...

In order to get a slightly rougher feature, let's replace the x-axis ** page units with chapter units ** and add them up.

Changes in emotional values by chapter


chapter_sum_df = df_tenki2.groupby("chapter_flag").new_score_sum.sum().reset_index()
sns.lineplot(x="chapter_flag", y="new_score_sum", data=chapter_sum_df)

スクリーンショット 2019-12-18 22.43.30.png

What can be read from the graph

--Chapter-based features can be grasped more globally than page-based -Emotional values are large in undulations for each chapter, and in the chapter immediately after the emotional value drops, the emotional value tends to swing positively. --There are no chapters with negative emotional values on a chapter-by-chapter basis.

The ups and downs of emotions are easier to understand and interpret than before!

But ... Take a look at the y-axis values in the graph. There are few negative </ font> values. Did people who saw movies and novels feel uncomfortable?

** "Weathering with You was such a peaceful work ...?" **

So let's go one step further and analyze. Next, divide the emotion value into positive value </ font> and negative value </ font> ** instead of the total value ** Draw a graph.

chapter_sum_df = df_tenki2.groupby("chapter_flag").sum_positive_scores.sum().reset_index()
sns.lineplot(x="chapter_flag", y="sum_positive_scores", data=chapter_sum_df,color="red")
chapter_sum_df2 = df_tenki2.groupby("chapter_flag").sum_negative_scores.sum().reset_index()
sns.lineplot(x="chapter_flag", y="sum_negative_scores", data=chapter_sum_df2,color="blue")

The red graph is the transition of positive value </ font>, and the blue is the transition of negative value </ font>. スクリーンショット 2019-12-19 11.23.04.png

Both positive and negative features are now clearly visible! As before, when the positive value and the negative value are totaled and expressed in one graph, ** "When both the positive and negative values showed large values, the values were offset and the features were hard to see." ** It seems.

Now, let's take a brief look at the content of the novel.

Looking at the overall feeling, the fluctuation range of emotional values is large at the beginning and the second half of the story. In chapter 2 and chapter 8 and later, both positive value </ font> and negative value </ font> are fairly high. Is it the ** "ki" ** part and the ** "transition" ** part in Kishōtenketsu?

Next, let's look at the magnitude of the value.

From this graph, it can be seen that the positive value </ font> of chapter 8 and chapter 10 is the largest and chapter 2 is the largest negative </ font>.

chapter8 chapter8 is a relatively peaceful scene. To play in the park together, to consult with Hina's younger brother ** "Nagi" ** or to buy a ring in order for ** "Hawaka" ** to confess to ** "Hina" ** I'm about to go. No wonder why the positive value </ font> is high.

chapter10 chapter10 is the climax of the story. This is a scene where ** "Sail Height" ** struggles to help ** "Hina" **. Not only positive value </ font> but also negative value </ font> is high, so the feeling of ** "sail height" ** is intense. I can imagine that it is.

chapter2 chapter2 is the early scene of the story. "Hodaka" who has run away from home comes to Tokyo and tries to find a part-time job, but is crushed by the waves of the city, and finally visits the office of ** "Suga" ** and works. Did the negative value </ font> appear where you were being rubbed by the waves of the city or being scolded by ** "Suga" **?

7. Compare emotional values for each character

From here, let's compare the emotional values for each character. Let's calculate the average emotional value per sentence of each character's dialogue.

How to calculate the average value

** Total emotion value of each character's dialogue / Number of dialogue of each character **

This time, we will compare the four main characters ** "Hodaka" **, ** "Hina" **, ** "Suga" **, and ** "Natsumi" **.

df_tenki3=df_tenki2.groupby(['speaker_name'])['new_score_sum'].mean().reset_index()
df_tenki4 = df_tenki3.sort_values('new_score_sum', ascending=False)
df_tenki_person = df_tenki4[(df_tenki4["speaker_name"] == "suga") | (df_tenki4["speaker_name"] == "hodaka") | (df_tenki4["speaker_name"] == "natsumi") | (df_tenki4["speaker_name"] == "hina")]
sns.catplot(x="speaker_name", y="new_score_sum", data=df_tenki_person,height=6,kind="bar",palette="muted")

スクリーンショット 2019-12-21 15.56.58.png

The positive and negative are clearly separated between men and women.

The women are really positive </ font>! On the contrary, the two men are quite negative </ font>.

Both of them show very close values. In fact, even in the novel, the main characters Hotaka and Suga

** "These two are very similar" **

There is a depiction that is said by the surroundings, but you can see that it is similar in terms of emotional value.

Let's also look at positive values </ font> and negative values </ font>.

df_tenki3=df_tenki2.groupby(['speaker_name'])['sum_positive_scores','sum_negative_scores'].mean().reset_index()
df_tenki4 = df_tenki3.sort_values('sum_positive_scores', ascending=False).reset_index()
df_tenki_person2 = df_tenki4[(df_tenki4["speaker_name"] == "suga") | (df_tenki4["speaker_name"] == "hodaka") | (df_tenki4["speaker_name"] == "natsumi") | (df_tenki4["speaker_name"] == "hina")]
sns.catplot(x="speaker_name", y="sum_positive_scores", data=df_tenki_person2,kind="bar",palette="muted")
sns.catplot(x="speaker_name", y="sum_negative_scores", data=df_tenki_person2,kind="bar",palette="muted")

スクリーンショット 2019-12-21 16.29.50.png

** "Hina" ** seems to speak less negative </ font> words than the three. What a strong girl ... ** "Natsumi" ** has the top positive value </ font>, but the negative value </ font> is also reasonably high. ** "Natsumi" ** is usually quite bright, but there are various negative </ font> words such as self-hatred due to job hunting and complaining to Suga. I have the impression that you are talking. And the positive value </ font> of ** "sail height" ** is quite low. Certainly there is not much bright impression ... But if the positive </ font> of ** "Hina" ** pulls you, these two may be a good combination (?)

8. Is there a relationship between weather and emotional values?

Finally, let's examine the relationship between "weather" </ font> and emotional values, which are the keys to this story. The method is to calculate the average value for each weather scene as in the case of the characters. The types of weather were classified into the following 6 categories, judging from the description when creating the corpus. "Sunny" "rain" "light rain" "heavy_rain" "clear" "snow" Since there are three types of rain intensity, you can see the relationship between rain intensity and emotional value. By the way, due to the setting of the story, the stage of the story (Tokyo) is basically "rain" except when the heroine ** "Hina" ** wishes for a sunny day and when ** something happens **. It is the state of. Because it is raining every day, everyone in the city wants it to clear up.

hypothesis

** As a hypothesis, a sunny scene will be positive </ font>, and the heavier the rain, the negative </ font>. ?? ** **

I expected it. Let's see the result!

sns.catplot(x="weather_flag", y="new_score_sum", data=df_tenki2_edited,height=6,kind="bar",palette="muted")

スクリーンショット 2019-12-21 15.50.09.png

After all sunny is the most positive </ font>! It's predictable! It doesn't seem to have much to do with the intensity of the rain. However, although "clear" is very positive, "clear" has not changed much to positive </ font>. Why...? As mentioned above, the weather clears only when Hina wishes for a sunny day and when something happens **, but that ** thing ** is ..... so,

"Hina has disappeared" </ font>

Due to the disappearance of Hina, the city that had been raining until now becomes clear at once. However, the general public, who do not know that ** "Hina" ** has been sacrificed, is obediently pleased with the sunny weather. However, only the main character ** "Sail Height" ** knows that and is very sad. It should be the best weather in the story, but here the feelings of ** "Sail height" ** go in the opposite direction to the world. So even though it's sunny, it doesn't become very positive.

Lastly, regarding snow, this is the most negative </ font>. If you've seen a movie here, you'll be happy with it. This is the climax, where ** "Sail Height" ** and ** "Hina" ** escape from the police and escape to a love hotel. It is a scene where there are many negative </ font> expressions because the world and the sail heights are confused by the extremely abnormal weather that it snows even though it is August.

9. Conclusion

What I learned from the novel by sentiment analysis ――You can read the rough development of the story and the turning point of the change of origin. ――You can roughly judge the character of the character from the positive and negative values. --The transition of emotional values for each time series can be easily interpreted by dividing them into positive and negative values on a chapter-by-chapter basis. And this time, I investigated the relationship between the weather and emotional values, which is the easiest to understand, but it seems that interesting results will come out if I also investigate the relationship with other factors. If you have an idea like "It would be interesting to analyze this!", I would appreciate it if you could comment.

10. Challenges

Wide range of novel expression

Of course, I was prepared, but the range of expression of the novel was wider than I expected, and there was a depiction that it was difficult to give a positive negative on a word-by-word basis. In the final scene, ** "Sail Height" ** shouts "The weather can stay crazy", but it is the most powerful positive </ font> in the story. It's a nice scene, but if you take it literally, it will be negative. There is a limit to just giving emotional values in word units, which is a very difficult place.

11. Code

#Function to import emotion polarity dictionary
def _make_dict():
    
    import pandas as pd
    
    df_word_dict = pd.read_csv('./dict/edited_target_pair_list_word_out.csv')#noun
    df_wago_dict = pd.read_csv('./dict/edited_target_pair_list_wago_out.csv')#Words
    df_one_gram_dict = pd.read_csv('./dict/one_gram_dict_out.csv')#Multiple words

    word_dict = {}
    for pair_list in df_word_dict[['word','count']].values.tolist():
        if pair_list[1] !='0':
            word_dict[pair_list[0]] = pair_list[1]

    wago_dict = {}
    for pair_list in df_wago_dict[['word','count']].values.tolist():
        if pair_list[1] !='0':
            wago_dict[pair_list[0]] = pair_list[1]

    one_gram_dict = {}
    for pair_list in df_one_gram_dict[['word1','word2','score']].values.tolist():
        one_gram_dict[(str(pair_list[0]),str(pair_list[1]))] = pair_list[2]
    
    return word_dict,wago_dict,one_gram_dict

#A function that splits text sentence by sentence
def _split_per_sentence(text):
    import re
    re_delimiter = re.compile("[。,.!\?!?]")
    for sentence in re_delimiter.split(text):
        if sentence and not re_delimiter.match(sentence):
            yield sentence

def _sorted_second_list(polarities_and_lemmanum):
    from operator import itemgetter

    sorted_polarities_and_lemmanum=sorted(polarities_and_lemmanum, key=itemgetter(1))
    return [i[0] for i in sorted_polarities_and_lemmanum]

def _calc_sentiment_polarity(sentence):
        import MeCab
        word_dict,wago_dict,one_gram_dict = _make_dict()

        NEGATION = ('Absent', 'Zu', 'Nu','Hmm')
        tagger = MeCab.Tagger('-Owakati -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
        tagger.parse('')  # for avoiding bug
        word_polarities = [] # word_List of emotional values corresponding to dict
        wago_polarities = [] # wago_List of emotional values corresponding to dict
        polarities_and_lemmanum = [] #List of final emotional values and lemmanum
        lemmas = [] #Headword,Words in the form listed in the dictionary
        word_polarity_apeared = False
        wago_polarity_apeared = False
        
        word_nutoral_polarity_apeared = False
        wago_nutoral_polarity_apeared = False
        
        word_out_polarity_apeared = False
        wago_out_polarity_apeared = False
        
        word_polarity_word = '' #Provisional for error handling
        wago_polarity_word = '' #Provisional for error handling
        
        word_nutoral_word = ''
        wago_nutoral_word = ''
        
        word_out_polarity_word = ''
        wago_out_polarity_word = ''
        
        last_hinsi = ''
        last_word = ''
        
        node = tagger.parseToNode(sentence)
        word_score_pair_list = []
        lemma_num = 0#For noting the order of words
        lemma_dict = {}
        while node:
            if 'BOS/EOS' not in node.feature:
 
                surface = node.surface
                feature = node.feature.split(',')
                lemma = feature[6] if feature[6] != '*' else node.surface
                lemma_num += 1
                lemma_dict[lemma] = lemma_num
                #Processing of lemma converted into words in the format listed in the dictionary of divided words
                #That word is word_Processing when there is a dict

                if word_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in word_dict:
                    word_polarity_apeared = False  
                elif wago_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in wago_dict:
                    wago_polarity_apeared = False
                elif word_nutoral_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in word_dict:
                    word_nutoral_polarity_apeared = False
                elif wago_nutoral_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in wago_dict:
                    wago_nutoral_polarity_apeared = False

                    
                try:
                    if word_dict[lemma] in ['p','n']:
                        polarity = 1 if word_dict[lemma] == 'p' else -1
                        word_polarities.append([polarity,lemma_dict[lemma]])
                        word_polarity_apeared = True
                        word_polarity_word = lemma
                    elif word_dict[lemma] == 'f':
                        word_polarities.append([0,lemma_dict[lemma]])
                        word_nutoral_polarity_apeared = True
                        word_nutoral_word = lemma
                        polarity = 0
                        #word_Since 0 is erased in advance with dict, else processing is unnecessary, but leave it in consideration of readability.
                    else:
                        polarity = 0
                #word_When there is no word in the dict
                except:
                    #wago_When there is a word in the dict
                    try:
                        if wago_dict[lemma] in ['Positive (experience)','Negative (experience)','Positive (evaluation)','Negative (evaluation)']:
                            polarity = 1 if wago_dict[lemma] in ['Positive (experience)','Positive (evaluation)'] else -1
#                             print(polarity)
                            wago_polarities.append([polarity,lemma_dict[lemma]])
                            wago_polarity_apeared = True
                            wago_polarity_word = lemma
                        elif wago_dict[lemma] == 'neutral':
                            wago_polarities.append([0,lemma_dict[lemma]])
                            wago_nutoral_polarity_apeared = True
                            wago_nutoral_word = lemma
                            polarity = 0
                        else:
                            polarity = 0
                     #word_dict also wago_Processing when there is no word in dict
                    except:  
                        if word_polarity_apeared and surface in NEGATION and wago_nutoral_polarity_apeared is False:

                            if last_hinsi in ['noun','Particle','Auxiliary verb','verb','adjective']:

                                word_polarities[-1][0] *= -1
                                try:
                                    reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_polarity_word,1])+1)
                                except:
                                    reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_polarity_word,-1])+1)
                                finally:

                                    word_score_pair_list[reverse_num]=[word_polarity_word+'+'+lemma,word_polarities[-1][0]]
                                    word_polarity_apeared = False
                                    word_polarity_word = ''
                                    polarity = 0
                            else:
                                polarity = 0
                        #"Please fix" "Please improve"-Processing to 1
                        elif lemma in ['Give me','want','Wish']:
                            
                            try:
                                if word_polarity_word or word_nutoral_word !='':
                                    last_polarities_word = [i for i in [word_polarity_word,word_nutoral_word] if i !=''][0]
                                    try:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
                                    except:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
                                    finally:
                                        word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
                                        word_polarity_apeared = False
                                        word_polarity_word = ''
                                        word_polarities[-1][0] = -1
                                        polarity = 0
                                    
                                elif wago_polarity_word or wago_nutoral_word !='':
                                    last_polarities_word = [i for i in [wago_polarity_word,wago_nutoral_word] if i !=''][0]
                                    try:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
                                    except:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
                                    finally:
                                        word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
                                        wago_polarity_apeared = False
                                        wago_polarity_word = ''
                                        wago_polarities[-1][0] = -1
                                        polarity = 0
                            except:
                                polarity = 0
                                #When there is a "ka" at the end of the sentence-Processing to 1
                        elif last_hinsi in ['Auxiliary verb','Particle'] and lemma == 'Or':
                            
                            try:
                                if word_polarity_word or word_nutoral_word !='':
                                    last_polarities_word = [i for i in [word_polarity_word,word_nutoral_word] if i !=''][0]
                                    try:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
                                    except:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
                                    finally:
                                        word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
                                        word_polarity_apeared = False
                                        word_polarity_word = ''
                                        word_polarities[-1][0] = -1
                                        polarity = 0
                                    
                                elif wago_polarity_word or wago_nutoral_word !='':
                                    last_polarities_word = [i for i in [wago_polarity_word,wago_nutoral_word] if i !=''][0]
                                    try:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
                                    except:
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
                                    finally:
                                        word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
                                        wago_polarity_apeared = False
                                        wago_polarity_word = ''
                                        wago_polarities[-1][0] = -1
                                        polarity = 0
                            except:
                                polarity = 0
                            
                        elif word_nutoral_polarity_apeared:

                            if last_hinsi in ['noun','Particle','Auxiliary verb','verb'] and lemma in NEGATION:
                                lemma_type = 'denial'
                                try:
                                    word_polarities[-1][0] += one_gram_dict[(word_nutoral_word,lemma_type)]
                                    reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_nutoral_word,0])+1)
                                    word_score_pair_list[reverse_num]=[word_nutoral_word+'+'+lemma,one_gram_dict[(word_nutoral_word,lemma)]]
                                    word_nutoral_polarity_apeared = False
                                    word_nutoral_word = ''
                                except:
                                    polarity = 0
                            elif last_hinsi in ['noun','Particle','Auxiliary verb','verb'] and lemma not in NEGATION:
                                try:
                                    word_polarities[-1][0] += one_gram_dict[(word_nutoral_word,lemma)]
                                    reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_nutoral_word,0])+1)
                                    word_score_pair_list[reverse_num]=[word_nutoral_word+'+'+lemma,one_gram_dict[(word_nutoral_word,lemma)]]
                                    word_nutoral_polarity_apeared = False
                                    word_nutoral_word = ''
                                except:
                                    polarity = 0
                    #That word is word_What to do if it wasn't in the dict
                        else:            
                        #The word is,Absent', 'Zu', 'Nu'If so, reverse the polarity of the previous word
                            if wago_polarity_apeared and surface in NEGATION and wago_nutoral_polarity_apeared is False\
                            and word_polarity_apeared is False and word_nutoral_polarity_apeared is False:

                                if last_hinsi in ['noun','adjective','Auxiliary verb']:

                                    wago_polarities[-1][0] *= -1
                                    reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_polarity_word,wago_polarities[-1][0]*(-1)])+1)
                                    word_score_pair_list[reverse_num]=[wago_polarity_word+'+'+lemma,wago_polarities[-1][0]]
                                    wago_polarity_apeared = False
                                    word_polarity_word = ''
                                    polarity = 0
                                else:
                                    polarity = 0
                            
                            elif wago_nutoral_polarity_apeared:
                                #Neutral + negation processing
                                if last_hinsi in ['verb','Particle','助verb'] and lemma in NEGATION:
                                    lemma_type = 'denial'   
                                    try:
                                        lemma_type = 'denial'
                                        wago_polarities[-1][0] += one_gram_dict[(wago_nutoral_word,lemma_type)]
                                        #Processing to trace the list in reverse
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_nutoral_word,0])+1)
                                        word_score_pair_list[reverse_num]=[wago_nutoral_word+'+'+lemma,one_gram_dict[(wago_nutoral_word,lemma_type)]]
                                        wago_nutoral_polarity_apeared = False
                                        wago_nutoral_word = ''
                                    except:
                                        polarity = 0
                                #Processing other than neutral + negation
                                elif last_hinsi in ['noun','verb','adjective','Particle','助verb'] :
                                    try:
                                        wago_polarities[-1][0] += one_gram_dict[(wago_nutoral_word,lemma)]
                                        #Processing to trace the list in reverse
                                        reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_nutoral_word,0])+1)
                                        word_score_pair_list[reverse_num]=[wago_nutoral_word+'+'+lemma,one_gram_dict[(wago_nutoral_word,lemma)]]
                                        wago_nutoral_polarity_apeared = False
                                        wago_nutoral_word = ''
                                    except:
                                        polarity = 0
                                else:
                                    polarity = 0
                            else:
                                polarity = 0

          
                word_score_pair = [lemma,polarity]
                word_score_pair_list.append(word_score_pair)

                last_hinsi = node.feature.split(',')[0]
                last_word = lemma
            node = node.next
        if word_polarities:
            polarities_and_lemmanum.extend(word_polarities)
        if wago_polarities:
            polarities_and_lemmanum.extend(wago_polarities)
        #lemma_Ascending sort by num
        try:
            polarities = _sorted_second_list(polarities_and_lemmanum)
        #Only use polarity values other than 0. If 0 remains, an error will occur in the subsequent processing.
            polarities = [i for i in polarities if i !=0]
        except:
            polarities = []
        
        try:
            if sum(polarities) / len(polarities) ==0:
                score = float(polarities[-1])
#                 print('=================================================')
                print(sentence+'→→ Priority is given to the emotional value at the end of the sentence')
#                 print('=================================================')
                
            else:
                score = sum(polarities) / len(polarities) 
        except:
                score = 0
        
        if not polarities:
            return 0,0,0,word_score_pair_list
        return score,sum(i for i in polarities if i > 0),sum(i for i in polarities if i < 0),word_score_pair_list
    
def _analyze(text):
        scores,total_word_score_pair_list,positive_word_cnt_list,negative_word_cnt_list = [],[],[],[]
        
        for sentence in _split_per_sentence(text):
            #Replace sentences for sentiment analysis Example: No → No
            replaced_sentence = _emotion_replace_text(sentence)
            score,positive_word_cnt,negative_word_cnt,word_score_pair_list = _calc_sentiment_polarity(replaced_sentence)
            scores.append(score)
            positive_word_cnt_list.append(positive_word_cnt)
            negative_word_cnt_list.append(negative_word_cnt)
            total_word_score_pair_list.append(word_score_pair_list)

        return scores,positive_word_cnt_list,negative_word_cnt_list,total_word_score_pair_list

def _flatten_abs1(x):
    #A function that flattens a double list and makes it a list of only pairs with emotional values
    return [e for inner_list in x for e in inner_list if e[1] !=0]

def score_sum_get(x):
    #_flatten_A function that sums the emotional values obtained by abs1
    emo_list=[]
    for inner_list in x:
        for e in inner_list:
            if e[1] !=0:
                emo_list.append(e[1])
    return sum(emo_list)


from datetime import datetime as dt
from datetime import date


#Reading training data
import pandas as pd
path='./data/tenkinoko.csv'
df_tenki = pd.read_csv(path,encoding="SHIFT-JIS")
df_tenki["chapter_flag"] = df_tenki.chapter.apply(chapter_flag)

add_col_name=['scores','positive_word_cnt_list','negative_word_cnt_list','total_word_score_pair_list']
for i in range(len(add_col_name)):
    col_name=add_col_name[i]
    df_tenki2[col_name]=df_tenki2['text'].apply(lambda x:_analyze(x)[i])

df_tenki2['score_sum']=df_tenki2['scores'].apply(lambda x:sum(x))
#Count the number of positives per word
df_tenki2['sum_positive_scores']=df_tenki2['positive_word_cnt_list'].apply(lambda x:sum(x))
#Count the number of negatives per word
df_tenki2['sum_negative_scores']=df_tenki2['negative_word_cnt_list'].apply(lambda x:sum(x))
#With emotional polarity words[morpheme,Emotion value]Processing to make only pairs of
df_tenki2['total_word_score_pair_list_abs1']=df_tenki2['total_word_score_pair_list'].apply(lambda x:_flatten_abs1(x))

#Calculate the total emotional value
df_tenki2['new_score_sum']=df_tenki2['total_word_score_pair_list'].apply(lambda x:score_sum_get(x))

Recommended Posts