[PYTHON] I tried to reproduce Mr. Saito who appears in "Eine Kleine Nachtmusik" as Mr. Sakurai of Mr. Children

Introduction

Kotaro Isaka's novel ["Aine Kleine Nachtmusik"](https://www.amazon.co.jp/%E3%82%A2%E3%82%A4%E3%83%8D%E3%82%AF% E3% 83% A9% E3% 82% A4% E3% 83% 8D% E3% 83% 8A% E3% 83% 8F% E3% 83% 88% E3% 83% A0% E3% 82% B8% E3% 83% BC% E3% 82% AF-% E5% B9% BB% E5% 86% AC% E8% 88% 8E% E6% 96% 87% E5% BA% AB-% E4% BC% 8A% E5% Do you know the work 9D% 82% E5% B9% B8% E5% A4% AA% E9% 83% 8E-ebook / dp / B0746CS4T6)? If you like novels, you may know. It's a collection of short stories that feels like everyday life, A person named "Mr. Saito" appears in the work. If you pay a hundred yen and say, "I feel like this now" and "I'm in this situation," Mr. Saito will play some of the songs that match the mood of the customer on the computer. There is only Mr. Saito in the song, and a part of Kazuyoshi Saito's song is played.

So I thought that I could do the same thing using sentiment analysis. I decided to make it this time. Any artist is fine, but I'm a Mr. Children fan myself I made it with a character called Sakurai-san.

This time, we will use the sentiment analysis of COTOHA API. It seems that they are also doing a campaign, and they also participate.

What you made

It's faster to see if you made something than to explain.

you:
You: Every day is monotonous and boring. I wonder if there is anything good.
You: Every day is monotonous and boring. I wonder if there is anything good.
Sakurai: I'm in my late 20s with a smooth sail going back and forth between Tokyo and Paris. I've pushed away the wave of bubbly generational changes and sold it to my clients.[Delmo]

You can take anything, It seems that you can interpret that you may have a smooth sailing life by doing a job that you often go to Paris. This will be part of the lyrics of Mr. Children's song "Delmo".

System overview

It can be roughly divided into (1) data construction and (2) phrase search processing.

sakuraisan-image.png

① Data construction Scraping the lyrics from Uta-Net, Divide into phrases. (One paragraph of the whole lyrics is one phrase) Each phrase is applied to the sentiment analysis API and the analysis result is saved in the DB.

② Phrase search The customer explains the situation and applies the content to the sentiment analysis API to obtain the analysis result. After that, the phrases with similar scores are searched from the DB. The search is a range search of the score, while expanding the range. If multiple phrases can be obtained, one is randomly selected and replied to the customer.

Execution result

Here are some of the things I did and found interesting.

You: I'm so happy to be praised at work today!
Sakurai: If you're frustrated, replenish your love. The flames of love towards you burn. The Shiba Inu in the opposite house also says "Hyponjuru."!Ah, the world is rosy. This is so CENTER OF UNIVERSE I am the center. Ah, the world is wonderful.[CENTER OF UNIVERSE]

Looking only at the first half, is that frustrating? I got it, I'm singing something really cheerful in the second half. Work is going well I am the world champion! Target.

You: I failed to study for the exam. I wonder if I will do my best next year. .. ..
Sakurai: I want to take a deep breath and release this feeling into the sky I want to get out of the dull mood that I'm trapped in my own world, and always want to make Sunrise shine in my heart[SUNRISE]

For the time being, take a breath and get out of the dull mood. It feels negative, but it also gives me a lot of encouragement.

You: I won't say I'm not in love anymore
Sakurai: I can hear you, the uncontrollable feelings about you are here, even if you close your ears, they are ringing[365 days]

The feelings I love and irresistible match.

You: Actually, I'm thinking of breaking up with her dating
Sakurai: Already good!?Let's end this!?You think so too!? [I]

It may be telling you that you can finish it.

code

If you are interested, please take a look. I made something that works for the time being, so please be aware that it is not well organized. .. ..

DB schema

I'm using MYSQL. This time, Mr.Children will be registered as artist information in advance.

create database sakurai;

create table artist
  (artist_id smallint auto_increment not null primary key,
  artist_name varchar(100));

insert into artist (artist_name) values('Mr.Children');

create table title
  (title_id smallint auto_increment not null primary key,
  title varchar(100),
  artist_id smallint);
 
create table lyric
 (title_id smallint,
  phrase_id int auto_increment not null primary key,
  phrase varchar(1000),
  score float,
  sentiment tinyint);
Python code

sakuraisan.py


# -*- coding: utf-8 -*-

import random, requests, json, sys, time
import urllib.request
import mysql.connector as mydb
import pandas as pd

from bs4 import BeautifulSoup

ARTIST_ID = 1 #ID of the artist registered in advance in the DB
AGENT_NAME = 'Sakurai' #The name of the agent who answers

#COTOHA API class
class CotohaApi():
    def __init__(self):
        self.COTOHA_ACCESS_INFO = {
            "grantType": "client_credentials",
            "clientId": "<Your Client ID>",
            "clientSecret": "<Your own Client Secret>"
        }
        self.ACCESS_TOKEN_PUBLISH_URL = '<Your own Access Token Publish URL>'
        self.BASE_URL = '<Your API Base URL'

        self.ACCESS_TOKEN = self.get_access_token()

    #Obtaining an access token
    def get_access_token(self):
        headers = {
            "Content-Type": "application/json;charset=UTF-8"
        }
        access_data = json.dumps(self.COTOHA_ACCESS_INFO).encode()
        request_data = urllib.request.Request(self.ACCESS_TOKEN_PUBLISH_URL, access_data, headers)
        token_body = urllib.request.urlopen(request_data)
        token_body = json.loads(token_body.read())
        self.access_token = token_body["access_token"]
        self.headers = {
            'Content-Type': 'application/json;charset=UTF-8',
            'Authorization': 'Bearer {}'.format(self.access_token)
        }

    #Implement sentiment analysis API and return analysis results
    def sentiment_analysis(self, text):
        request_body = {
            'sentence': text
        }
        url = self.BASE_URL + 'nlp/v1/sentiment'
        text_data = json.dumps(request_body).encode()
        request_data = urllib.request.Request(url, text_data, headers=self.headers, method='POST')
        sentiment_result = urllib.request.urlopen(request_data)
        sentiment_result = json.loads(sentiment_result.read())
        return sentiment_result

    # Positive:1, Negative:-1, Neutral:Convert to 0
    def convert_sentiment(self, sentiment_in_word):
        if sentiment_in_word == 'Positive':
            return 1
        elif sentiment_in_word == 'Neutral':
            return 0
        elif sentiment_in_word == 'Negative':
            return -1

#DB operation class
class DBHandler():
    def __init__(self):
        self.conn = mydb.connect(
            host = '<DB host name>',
            port = '<DB port number>',
            user = '<DB username>',
            password = '<DB password>',
            database = '<DB name>',
            charset='utf8'
        )

        self.conn.ping(reconnect=True)
        self.cur = self.conn.cursor()

#Data construction class
class Learn():
    def __init__(self):
        self.FILE_NAME = 'list.csv'
        self.ARTIST_NUMBER = '684' #Uta-Net Artist No.(Mr.Children 684)
        self.MAX_PAGE = 2 #Number of pages in the list of songs by Uta-Net artists (Mr.Children have 2 pages)

    #Collect lyrics from Uta-Net
    def gather_lyric(self):
        #Create a table to store scraped data
        list_df = pd.DataFrame(columns=['Song title', 'lyrics'])

        for page in range(1, self.MAX_PAGE + 1):
            #Song page top address
            base_url = 'https://www.uta-net.com'

            #Lyrics list page
            url = 'https://www.uta-net.com/artist/' + self.ARTIST_NUMBER + '/0/' + str(page) + '/'
            response = requests.get(url)
            soup = BeautifulSoup(response.text, 'lxml')
            links = soup.find_all('td', class_='side td1')

            for link in links:
                a = base_url + (link.a.get('href'))

                #Lyrics detail page
                response = requests.get(a)
                soup = BeautifulSoup(response.text, 'lxml')
                title = soup.find('h2').text
                print(title)
                song_lyrics = soup.find('div', itemprop='text')
                
                for lyric in song_lyrics.find_all("br"):
                    lyric.replace_with('\n')
                song_lyric = song_lyrics.text

                #Wait 1 second to not load the server
                time.sleep(1)

                #Add the acquired lyrics to the table
                tmp_se = pd.DataFrame([title, song_lyric], index=list_df.columns).T
                list_df = list_df.append(tmp_se)

        #csv save
        list_df.to_csv(self.FILE_NAME, mode = 'a', encoding='utf8')

    #Divide the lyrics into phrases and register the data including the sentiment analysis results in the DB
    def add_lyric(self):
        db = DBHandler()
        df_file = pd.read_csv(self.FILE_NAME, encoding='utf8')
        song_titles = df_file['Song title'].tolist()
        song_lyrics = df_file['lyrics'].tolist()
        
        #Note: If there are many songs, the upper limit of API that can be executed in COTOHA per day will be hit (about 100 songs per day is a guide).
        for i in range(len(song_titles)):

            #Add title
            title = song_titles[i]

            print("Info: Saving {}...".format(title), end="")
            db.cur.execute(
                """
                insert into title (title, artist_id)
                values (%s, %s);
                """,
                (title, ARTIST_ID)
            )
            db.conn.commit()
            db.cur.execute(
                """
                select title_id from title
                where title= %s
                and artist_id = %s;
                """,
                (title, ARTIST_ID)
            )
            title_id = db.cur.fetchall()[-1][0]

            #Register the sentiment analysis result of the lyrics phrase
            #Phrase delimiter when two line breaks appear
            lyric = song_lyrics[i]
            lyric_phrases = lyric.split('\n\n')
            lyric_phrases = [lyric.replace('\u3000', ' ').replace('\n', ' ') for lyric in lyric_phrases]
            
            #Use the sentiment analysis API for each phrase and register the sentiment analysis results in the DB
            cotoha_api= CotohaApi()
            for phrase in lyric_phrases:
                sentiment_result = cotoha_api.sentiment_analysis(phrase)['result']
                sentiment = cotoha_api.convert_sentiment(sentiment_result['sentiment'])
                score = sentiment_result['score']
                
                db.cur.execute(
                    """
                    insert into lyric (title_id, score, sentiment, phrase)
                    values (%s, %s, %s, %s);
                    """,
                    (title_id, score, sentiment, phrase)
                )
                db.conn.commit()

            print("Done")
                
        db.conn.close()
        if db.conn.is_connected() == False:
            print("Info: DB Disonnected")

    def execute(self):
        print("Info:Collecting lyrics...")
        self.gather_lyric()
        print("Info:Adding lyrics to DB...")
        self.add_lyric()

#Phrase search class
class Search():
    def __init__(self):
        self.SEARCH_SCOPE = [0.01, 0.1, 0.3] #Width of score to search SCORE ± SEARCH_Search in the order of the list within the scope of SCOPE

    def execute(self):
        print("you:", end="")
        input_data = input()
        print("{}:".format(AGENT_NAME), end="")
        
        cotoha_api= CotohaApi()
        sentiment_result = cotoha_api.sentiment_analysis(input_data)['result']
        sentiment = cotoha_api.convert_sentiment(sentiment_result['sentiment'])
        score = sentiment_result['score']
        
        db = DBHandler()

        find_flag = 0
        #Search for phrases with similar scores while gradually expanding the search range
        for scope in self.SEARCH_SCOPE:

            #Confirm that there is at least one
            db.cur.execute(
                """
                select count(phrase_id) from lyric
                join title on lyric.title_id = title.title_id
                where sentiment = %s
                and score between %s and %s
                and artist_id = %s;
                """,
                (sentiment, score-scope, score+scope, ARTIST_ID)
            )
            hit_num = db.cur.fetchall()[-1][0]
            if hit_num > 0:
                find_flag = 1
                break
        
        #If there is even one search result, get the search result and reply to the customer
        if find_flag == 1:
            db.cur.execute(
                """
                select phrase,title from lyric
                join title on lyric.title_id = title.title_id
                where sentiment = %s
                and score between %s and %s
                and artist_id = %s;
                """,
                (sentiment, score-scope, score+scope, ARTIST_ID)
            )
            search_result = db.cur.fetchall()
            phrase_chosen = random.choice(search_result)
            print("{} [{}]".format(phrase_chosen[0], phrase_chosen[1]))
        else:
            print("I couldn't find good lyrics.")
        
        db.conn.close()
        

if __name__ == "__main__":
    args = sys.argv
    if len(args) == 2:
        process = args[1] #Command line argument learn:Register lyrics information in DB, search:Extract similar emotional phrases from DB
        if process == 'search':
            searcher = Search()
            searcher.execute()
        elif process == 'learn':
            learner = Learn()
            learner.execute()
        else:
            print("Error:Specify one command line argument[learn/search]")
    else:
        print("Error:Specify one command line argument[learn/search]")

There are two ways to execute it.

  • When building data
python sakuraisan.py learn
  • When searching for a phrase
python sakuraisan.py search

in conclusion

This time, I implemented it with a simple algorithm that fetches lyrics phrases with similar scores of sentiment analysis results. The COTOHA API can also take the emotions of words. For example, as in the official example The word "song" gives emotions such as "pleasing" and "safety". I think that better results will be returned if the information around here can be embedded in the search well.

Also, I think it would be interesting to use LINE Bot.

reference

I referred to the following article!

Recommended Posts

I tried to reproduce Mr. Saito who appears in "Eine Kleine Nachtmusik" as Mr. Sakurai of Mr. Children
I tried to implement blackjack of card game in Python
I tried to make an analysis base of 5 patterns in 3 years
I want to express my feelings with the lyrics of Mr. Children
I tried various patterns of date strings to be entered in pandas.to_datetime
I tried to display the altitude value of DTM in a graph
I tried to implement a card game of playing cards in Python
I tried to wake up the place name that appears in the lyrics of Masashi Sada on the heat map
I tried to implement PLSA in Python
I tried to implement permutation in Python
I tried to implement PLSA in Python 2
I tried to implement PPO in Python
I tried to implement merge sort in Python with as few lines as possible
[Azure] I tried to create a Linux virtual machine in Azure of Microsoft Learn