[PYTHON] Get the song name from the title of the video you tried to sing

background

code

I usually processed it using regular expressions. (There is an improved version of the code at the bottom, so it's recommended to scroll to the bottom before copying)

get_song_titles.py



def get_song_title(raw_title):

    # ()()[]Exclude []. Sometimes the left is half-width and the right is full-width.
    title = re.sub("[【(\(\[].+?[】)\)\]]","",raw_title)

    #If there are "" and "", extract the character string in them
    if "「" in title and "」" in title:
        title = title.split("「")[1].split("」")[0]

    if "『" in title and "』" in title:
        title = title.split("『")[1].split("』")[0]

    #I tried singing X with Y,I tried to sing and erase
    title = re.sub("To.*tried singing","",title)
    title = title.replace("tried singing", "") 

    # cover, covered,Erase the character string after covered by
    title = re.sub("[cC]over(ed)?( by)?.*", "", title)
 
    #Last/Delete after that
    if "/" in title:
        title = "/".join(title.split("/")[:-1])
    
    if "/" in title:
        title = "/".join(title.split("/")[:-1])
  
    # -If there is only one, erase the back
    if len(title.split("-")) == 2: 
        title = title.split("-")[0]

    #Erase the part that represents the collaboration member with x
    title_part_list = []
    for title_part in title.split(" "):
        if "×" not in title_part:
            title_part_list.append(title_part)
    title = " ".join(title_part_list)

    title = title.strip()

    return title


#Example when reading and using the data described later
if __name__ == "__main__":
    x = []
    y = []

    with open("sing_videos.tsv") as f:    
        for line in f:
            x.append(line.strip().split("\t")[1])

    #The first line is the header, so delete it
    del x[0]

    estimated_titles = [ get_song_titles(x) for raw_title in x ]

Performance evaluation

data

I manually labeled it using the metadata of the video I tried to sing, which I collected in the previous article (https://qiita.com/miyatsuki/items/fb933bb233d2896ca644). I have posted the data on GitHub, so please refer to it if necessary for additional tests. https://github.com/miyatsuki/VTuberNayoseDataset/blob/57fe0d785b40c19fa7b249034bdfe1fa62363743/data/sing_videos.tsv

result

Number of videos: 277 Correct answer rate: 92.42% (256/277)

It's not a big deal, but it's over 90%. Since the score is tuned while looking at the result, it is unknown whether the accuracy so far will be obtained for unknown videos.

What was useless

Video title Estimated title Correct answer
Natto!! -Moon!!(Tsukino Mito/iru) Parody-I tried to sing [Kou Uzuki] Natto!! -Moon!!parody- Moon!!
Disney Medley | covered by Inui Toko Disney Medley| Disney Medley
【Virtual to LIVE(covered by #Sanbaka)] Thank you for half a year of activity [Nijisanji] ] Thank you for half a year of activity Virtual to LIVE
[I tried to sing] Farewell Itching with Kenmochi [Crotch Warrior M Zune] Farewell Itching with Kenmochi Farewell itching
[Renai Circulation] I tried to sing. Renai Circulation- Bakemonogatari Cover By Utako suzuka Love circulation
[Yukitoki] cover.Eru [listening video] Yukitoki
[I tried to sing] Anime "How many kilograms of dumbbells can I have?" OP Request Muscle [Sister Claire x Hanahata Chaika] How many kilograms of dumbbells can you carry? Please muscle
[Parody] Do not blame the youkai Gymnastics No. 1 [Singing statement] Do not blame youkai Gymnastics first Yokai Gymnastics No. 1
[Original MV] Cinderella Girl with Ryushen and Suzuka Uta/ King&Prince I tried to sing [cover] Cinderella Girl with Ryushen and Suzuka Uta Cinderella Girl
⚙*.。..Karakuri Pierrot/Mahiro Yukishiro [I tried to sing] ⚙*.。..Karakuri Pierrot Karakuri clown
[Weathering with You] Grand Escape(Movie edit) feat.Toko Miura-Covered by Chima Machida & Piropar Grand Escape feat.Toko Miura Grand escape(Movie edit) feat.Toko Miura
[LOL part] VD&I tried to sing Blessing with G [parody] VD&Blessing with G Blessing
[1st Anniversary] K-ON!!U & I I tried to sing [Meiji Warabeda] K-ON!!U & I U&I
♡ future base I tried to sing ♡ ♡future base future base
[Yorushika] Suddenly I tried to sing rain and cappuccino [Alice Mononobe] Suddenly rain and cappuccino Rain with Cappuccino
Volume note] I tried to sing the old book mansion murder case/Rion Takamiya Volume note] Old book mansion murder case Old book mansion murder case
White sun/ King Gnu (Covered by Kakeru Yumeoi)Nippon Television's "Innocence Innocent Lawyer" theme song [I tried to sing] [Cover] King Gnu Innocence guilty lawyer White sun
[I tried to sing] Haro/Hawayu [Hand-painted PV] Halo Halo/Hawayu
[JK playing talk] I played Oration and sang [Alice Mononobe] Play the oration Oration
[I tried to sing] Red trap(who loves it?) /LiSA [Kaede Higuchi cover] Red trap Red trap(who loves it?)
[Your name. ] Nothing/ RADWIMPS (cover)Utako Suzuka [Original PV in the Sanctuary] Nandemonaiya"Your Name"/Utako Suzuka Nothing/RADWIMPS Utako Suzuka Nandemonaiya"Your Name" Nothing

Retest

I increased the number of data and investigated the accuracy again

data

We increased the number of target singers (all VTubers) and collected data again (the number of target videos has increased 4.8 times). https://github.com/miyatsuki/VTuberNayoseDataset/commit/576b89b5c8a6f74744cb24c62a5d8cb77a736ea7

result

Number of videos: 1335 Correct answer rate: 75.95% (1014/1335)

Logic adjustment

Since the percentage of correct answers has dropped sharply, I tried to support a slightly special pattern. Also, since there is a pattern that other people are singing the same song and the title is acquired correctly, I tried to utilize other estimation results so that I could pick up that information.

import pandas as pd
import re

def get_song_title(raw_title):
    
    #There is a pattern called [Song title] from "Title of work", so in that case, the content of [] is used as the title.
    if "Than【" in raw_title:
        title = raw_title.split("【")[1].split("】")[0]
    else:
        title = raw_title
    
    #If there is a symbol in the header, delete it
    if title[0] == "★":
        title = title[1:]
    
    # ()()[]Exclude []. Sometimes the left is half-width and the right is full-width.
    title = re.sub("[【(《\(\[].+?[】)》\)\]]"," ",title)

    #In the case of a pattern such as the "work name" theme song, delete that part
    for keyword in ["Theme song", "OP", "CM song"]:
        if "」{}".format(keyword) in title:
            end_index = title.index("」{}".format(keyword))
            for start_index in range(end_index, -1, -1):
                if title[start_index] == "「":
                    title = title[:start_index] + title[end_index + len(keyword) + 1:]
                    break
        
    for keyword in ["Theme song", "OP", "CM song"]:
        if "』{}".format(keyword) in title:
            end_index = title.index("』{}".format(keyword))
            for start_index in range(end_index, -1, -1):
                if title[start_index] == "『":
                    title = title[:start_index] + title[end_index + len(keyword) + 1:]
                    break
        
    #If there are "" and "", extract the character string in them
    #However, in rare cases, you may put your name in "". Ignore in that case
    if "「" in title and "」" in title:
        temp_title = title = title.split("「")[1].split("」")[0]
        if "cover" not in temp_title.lower():
            title = temp_title

    if "『" in title and "』" in title:
        temp_title = title.split("『")[1].split("』")[0]
        if "cover" not in temp_title.lower():
            title = temp_title

    #Erase the character string after singing
    title = re.sub("I tried to sing.*"," ", title)
    title = re.sub("tried singing.*"," ", title)

    # cover, covered,Erase the character string after covered by
    title = re.sub("[cC]over(ed)?( by)?.*", "", title)

    # /Delete after that
    if "/" in title:
        title = title.split("/")[0]

    if "/" in title:
        title =  title.split("/")[0]

    # -If there is, erase the back
    title = title.split("-")[0]

    #Erase the part that represents the collaboration member with x
    # #Erase 012-like expressions
    title_part_list = []
    for title_part in title.split(" "):
        if "×" not in title_part and not re.fullmatch("#[0-9]+", title_part):
            title_part_list.append(title_part)
    title = " ".join(title_part_list)

    #Remove leading and trailing whitespace
    title = title.strip()

    return title


#Video title and song title(Estimated value)And return the longest of the partially matched song titles
def get_nearest_title(video_title, music_titles):
    longest = 0
    ans = ""
    
    for music_title in music_titles:
        if len(music_title) <= longest:
            continue
        
        if music_title in video_title:
            ans = music_title
            longest = len(music_title)
    
    return ans


def decide_title(row):
    return row["estimated_title"] if len(row["estimated_title"]) > 0 else row["estimated_title2"]    


if __name__ == "__main__":

    evaluate_df = pd.read_table("sing_videos.tsv")
    evaluate_df["estimated_title"] = evaluate_df["video_title"].apply(get_song_title)

    #If the regular expression estimation result is empty, look for the most likely one from the estimation results of other videos.
    evaluate_df["estimated_title2"] = evaluate_df["video_title"].apply(
        get_nearest_title, music_titles = evaluate_df["estimated_title"].unique()
    )
    evaluate_df["estimated_title"] = evaluate_df.apply(decide_title, axis=1)
    evaluate_df = evaluate_df.drop(columns=["estimated_title2"])

Result (after logic correction)

Number of videos: 1335 Correct answer rate: 85.24% (1138/1335)

Recommended Posts

Get the song name from the title of the video you tried to sing
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried to get the RSS of the top song of the iTunes store automatically
[Python] I tried to get the type name as a string from the type function
I tried to get the location information of Odakyu Bus
I tried to get various information from the codeforces API
To get the name of the primitive etc. generated immediately before
I want to get the name of the function / method being executed
I tried to cut out a still image from the video
PhytoMine-I tried to get the genetic information of plants with Python
"The guy who predicts the number of views from the title of Jaru Jaru's video"
I tried to get the batting results of Hachinai using image processing
I tried to get the authentication code of Qiita API with Python.
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
Web scraping beginners tried to make a command to get the movie name of next Friday Road Show
Get the contents of git diff from python
How to get the notebook name you are currently using in Google Colab
I tried to touch the API of ebay
I tried to correct the keystone of the image
Change the decimal point of logging from, to.
To get the path of the currently running python.exe
From the introduction of pyethapp to the execution of contract
The story of moving from Pipenv to Poetry
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to sort out the objects from the image of the steak set meal-④ Clustering
I tried to uniquely determine the malware name from the report information obtained from Virus Total
I want to see the file name from DataLoader
Get the title of yahoo news and analyze sentiment
python beginners tried to predict the number of criminals
The wall of changing the Django service from Python 2.7 to Python 3
Get the variable name of the variable as a character string.
How to get the number of digits in Python
I tried to detect the iris from the camera image
I tried to summarize the basic form of GPLVM
[Python] Get the text of the law from the e-GOV Law API
Get the absolute path of the script you are running
Get the return code of the Python script from bat
Try to get the contents of Word with Golang
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
Script to get the expiration date of the SSL certificate
[Python] I tried to get Json of squid ring 2
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
[Linux] [C / C ++] How to get the return address value of a function and the function name of the caller
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried to create a Python script to get the value of a cell in Microsoft Excel
I tried to sort out the objects from the image of the steak set meal-① Object detection
A Python script that allows you to check the status of the server from your browser
Implementation of recommendation system ~ I tried to find the similarity from the outline of the movie using TF-IDF ~
I tried to find the trend of the number of ships in Tokyo Bay from satellite images.
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to analyze the whole novel "Weathering with You" ☔️
Try to get the function list of Python> os package
I tried to find the average of the sequence with TensorFlow
Get the package version to register with PyPI from Git
I want to get the operation information of yahoo route
How to calculate the amount of calculation learned from ABC134-D