[PYTHON] A story that made it possible to automatically create anison playlists from your music files

Introduction

I took music (mainly anime songs) from a CD and listened to it on my walkman, but before I knew it, my files exceeded 8000 songs. Even if I think about making a playlist, ** it takes a lot of time to throw my favorite anime song into the playlist **, check the tie-up information from the artist list, add songs, etc. ** The holiday is over **. (I don't think it's relevant to anyone using Spotify, Apple Music, etc ...) It would be nice if I could enjoy this kind of work, but as I was tired of creating playlists, I found that ** I was not suitable for playlist creation work in the first place **. Fortunately, I realized that ** I can't create playlists, but I can write programs **, so I created a program that automatically creates playlists for anime songs, but there were many things that didn't work, so I shared information. I also wrote an article.

Implemented

Although the introduction has become long, the program created this time is a program that creates ** anison playlists (m3u files) ** from ** existing music files (m4a, mp3, flac files) **. The content of the process is

  1. Get anime song information from csv file
  2. Get song information from music files stored on your PC
  3. Compare the data registered in 1. and 2. and output the matching data as a playlist (m3u file). I also created the following GUI with a simple one. (The GUI part will be omitted in this article.) apg_ui.png

If you specify an arbitrary path and press the execute button, the anime songs saved on your computer will be extracted and a playlist file will be output. It is also possible to create a playlist by specifying a specific animation by specifying the work name. The source code and executable file (for Windows) are located on GitHub, so please check them as well.

Program flow

The basic code of the playlist generator is apg.py in the code published on github. I also implemented the GUI code apg_gui.py implemented in PyQt5, but since this only calls the class defined in apg.py, you can create a playlist with just apg.py. In apg.py, ** A ** nison ** P ** laylist ** G ** enerator is abbreviated as APG class, and it has functions corresponding to 1 to 3 of the above implementation. I will. The contents of each process are explained below.

Get anime song information from csv file

The function MakeAnisonDatabase in apg.py is explained. Here, the information of anime songs is saved in the database. The reason for using the database is that it is useless to acquire anime song data and music on hand every time a playlist is created, and once the database is created, the time required to create the second and subsequent playlists can be reduced. It is the result of thinking. I just need to move it, so I implement it without thinking deeply about the structure of the database (I simply don't have knowledge of the database ...). The library used is sqlite3. The specific code is as follows. First, get the path of the DL csv file from Anison Generation. There are three zip files to download, and they are saved in anison.csv for animation, game.csv for games, and sf.csv for special effects (the original purpose is animation, but it's a big deal. Games and special effects are also used). If you place the unzipped folder in the data folder, it will be as follows.


./data/ --- anison/ -- anison.csv #Anime
         |          └ readme.txt
         ├ game/ --- game.csv    #game
         |         └ readme.txt
         └ sf/--- sf.csv         #Special effects
               └ readme.txt

Each folder also contains a text file, but by executing the following code, you can get only the csv file even if you put the unzipped folder in ./data as it is.


path_data = "./data" #Folder where the csv file downloaded from Anison Generation is saved
file_paths = [i for i in glob.glob(path_data + "/**", recursive=True) if os.path.splitext(i)[-1] == ".csv"] #Recursively get only the path of the csv file

Next, register the information in the csv file in the database. Since there are only three target csv files, anison.csv, game.csv, and sf.csv, do not read other files. The following code creates tables called anison, game, and sf and registers 1000 rows in the database. (I referred to some site, but I feel that it doesn't change with or without it.)


data_name = ["anison.csv", "game.csv", "sf.csv"] #The name of the csv file downloaded from Anison Generation
for file_path in file_paths:
       if os.path.basename(file_path) in data_name: #The name of the csv file is anison.csv, game.csv, sf.If any of csv
            category = os.path.splitext(os.path.basename(file_path))[0]

            # anison, game,Create a table named sf
            with sqlite3.connect(self.path_database)as con, open(file_path, "r", encoding="utf-8") as f:
                cursor = con.cursor()
                #Create if table does not exist
                cursor.execute("CREATE TABLE IF NOT EXISTS '%s'('%s', '%s', '%s', '%s', '%s', '%s')" % (category, "artist", "title", "anime", "genre", "oped", "order"))

                command = "INSERT INTO " + category + " VALUES(?, ?, ?, ?, ?, ?)" #SQL statement definition
                
                lines = f.readlines() #Read csv file
                buffer = []           #Variables for registering data collectively
                buffer_size = 1000    #Register 100 items at once

                for i, line in tqdm(enumerate(lines[1:])): #Read the csv file line by line
                   *keys, = line.split(",")  #Get the key for each row
                    #Obtain singer name, song name, broadcast order, OPED type, animation name, genre as keys
                    artist, title, order, oped, anime, genre = trim(keys[7]), trim(keys[6]), trim(keys[4]), trim(keys[3]), trim(keys[2]), trim(keys[1]) 
                        
                    buffer.append([artist, title, anime, genre, oped, order])

                    if i%buffer_size == 0 or i == len(lines) - 1:
                        cursor.executemany(command, buffer) #SQL execution
                        buffer = []
                

                #Delete duplicate registered data
                cursor.executescript("""
                    CREATE TEMPORARY TABLE tmp AS SELECT DISTINCT * FROM """ + category + """;
                    DELETE FROM """ + category + """;
                    INSERT INTO """ + category + """ SELECT * FROM tmp;
                    DROP TABLE tmp;
                    """)

                con.commit()

The trim function is a function that replaces characters that are not desirable when creating a database or playlist with other characters, and erases character strings such as line breaks and commas. Also, since there was a problem when creating a playlist even if there was a difference between full-width and half-width characters, all the characters that can be converted to half-width characters in the mojimoji library have been changed to half-width characters.


import mojimoji as moji

def trim(name):
    name = name.replace("\n", "").replace('\'', '_').replace(" ", "").replace("\x00", "").replace("\"", "")
    name = moji.zen_to_han(name)
    return name.lower()

Next, get the information of the music file saved in the PC and register it in the database.

Get song information from music files stored on your PC

The format of the m3u file, which is a playlist file, is as follows, and the song length, song title, and path to the music file are required for each song.

 #EXTM3U
 #EXTINF:Song length, song title
Absolute path to file
 #EXTINF:Song length, song title
Absolute path to file
        :

In the function MakeMusiclibrary, the path of the music file is acquired and the music file is registered in the database in the same way as the anime song data. The process of acquiring anime song data has changed to a music file, but the basic process is not much different from the process of ↑. I have defined the following function to get the information of the music file.

from mutagen.flac import FLAC
from mutagen.mp3 import MP3
from mutagen.mp4 import MP4

def getMusicInfo(path):
    length, audio, title, artist = 0, "", "", ""
    
    if path.endswith(".flac"):
        audio = FLAC(path)
        artist = trim(audio.get('artist', [""])[0])
        title = trim(audio.get('title', [""])[0])
        length = audio.info.length

     elif path.endswith(".mp3"):
        audio = EasyID3(path)
        artist = trim(audio.get('artist', [""])[0])
        title = trim(audio.get('title', [""])[0])
        length = MP3(path).info.length
        
    elif path.endswith(".m4a"):
        audio = MP4(path)
        artist = trim(audio.get('\xa9ART', [""])[0])
        title = trim(audio.get('\xa9nam', [""])[0])
         length = audio.info.length
        
    return audio, artist, title, length

The getMusicInfo function gets the song title, singer name, and song length from the file path. The files that register the music library in the database are as follows. For the singer name and title, inconvenient characters were converted with the trim function.


    def makeLibrary(path_music):
        music_files = glob.glob(path_music + "/**", recursive=True) #Get all the files in the library
        
        #Create a table named library and register music files
        with sqlite3.connect(self.path_database) as con:
            cursor = con.cursor()
            #create if there is no library table
            cursor.execute("CREATE TABLE IF NOT EXISTS library(artist, title, length, path)")
            #SQL statement
            command = "INSERT INTO library VALUES(?, ?, ?, ?)"

            buffer = []
            for i, music_file in tqdm(enumerate(music_files)):
                audio, artist, title, length = getMusicInfo(music_file) #Get music file information from the path
                
                if audio != "":

                    buffer.append(tuple([trim(artist), trim(title), length, music_file]))

                    if i % 1000 == 0 or i == len(music_files) - 1:
                        cursor.executemany(command, buffer)
                        buffer = []
          
            cursor.executescript("""
                CREATE TEMPORARY TABLE tmp AS SELECT DISTINCT * FROM library;
                DELETE FROM library;
                INSERT INTO library SELECT * FROM tmp;
                DROP TABLE tmp;
                """)
            
            con.commit()

Next, I will explain the function that outputs a playlist using the data so far.

Output playlist (.m3u file)

The explanation here is about the function generatePlaylist in apg.py (I've omitted various things for explanation ...). First, the singer name is obtained from the anime song information registered in the database. The SQL statement and execution part are as follows. category is a table name, which can be anison, game, or sf. Substitute the acquired singer name list in an appropriate variable. (Access to the database is the same as the program above, so it is omitted.)

cursor.execute("SELECT DISTINCT artist FROM '%s'" % category)
artist_db = cursor.fetchall()

Similarly, get the singer name in the music library.


cursor.execute('SELECT artist FROM library')
artist_lib = sorted(set([artist[0] for artist in cursor.fetchall()]))

Next, check whether the singer name in the music library exists in the singer name list in the anime song database, that is, whether the singer in the music library may have sung the anime song. I think it is a culture peculiar to anime songs, but when music is imported from a CD, the singer name may be the character name (cv. Voice actor name). On the other hand, the singer list in the anime song database is (probably) registered by the voice actor name, so it is not possible to search by the exact match of the singer name. Therefore, we examined the similarity between the singer in the music library and the singer in the anime song database, and selected the singer name with the highest similarity and the maximum similarity exceeding the threshold as the singer who may be singing the anime song. In the code (apg.py), it corresponds to the following part.

#For all artists in the library
for i, artist in artist_lib:
    #Calculate the similarity of artist names
    similarities = [difflib.SequenceMatcher(None, artist, a[0]).ratio() for a in artist_db]

    if th_artist < max(similarities): #When the maximum value of similarity is above the threshold
        #Get all music information of the target artist
        info_list = self.getInfoDB('SELECT * FROM library WHERE artist LIKE \'' + artist + '\'', cursor)

After identifying the singer, get all the music information of that singer in the music library and look up the songs in the music library of Anison. The fact that exact match cannot be used for singer names also means that exact match cannot be used for song titles. For example, a song such as a song name (album ver.) Is an exact match and will not be searched. Therefore, check the similarity of the music list in the same way.


        #It is a continuation of the if statement of ↑
        #Get all songs of the singer with the highest degree of similarity from the anime song database
        cursor.execute("SELECT DISTINCT * FROM '%s' WHERE artist LIKE \'%%%s%%\'" % (category, artist_db[similarities.index(max(similarities))][0]))
        title_list = cursor.fetchall() #A list of songs by a specific artist in the anime song database

        for info in info_list: #Find out if all songs in your music library are anime songs
            artist, title, length, path = info[0], info[1], info[2], info[3]                         

            title_ratio = [difflib.SequenceMatcher(None, title, t[1]).ratio() for t in title_list] #Calculate the similarity with the music in the anime song database
 
            if th_title < max(title_ratio): #Anison if the similarity is above the threshold
                t = title_list[title_ratio.index(max(title_ratio))]
                lines.append(['#EXTINF: ' + str(int(length)) + ', ' + title + "\n" + path, t[-1]]) # .Variables for output of m3u file(list)Add to

After calculating the similarity of all artists and the similarity of songs, write the information of anime songs in the variable lines to a file.

path_playlist = "./playlist/AnimeSongs.m3u"

with open(path_playlist, 'w', encoding='utf-16') as pl:
    pl.writelines('#EXTM3U \n')
    pl.writelines("\n".join([line[0] for line in lines]))

You have now created an anime song playlist. For the sake of explanation, there are some differences from the code published on github, so if you are interested, please check apg.py on github.

problem

Creating playlists has become possible by creating three functions, but I would like to touch on some of the problems that occurred in this implementation.

Songs that are not anime songs are also added as anime songs

Currently, there is a problem that songs that are not anime songs are included in the playlist when creating the playlist. For example, if you set BLEACH as the title of your work, in addition to UVERworld's D-technolife, a very similar song called D-technorize will be added to the playlist. In addition to Aimer's Re: I am, if you try to create a Gundam UC playlist, the songs Re: far and Re: pray will also be included in the Gundam UC playlist. In the case of such a work alone, it can be dealt with by raising the threshold of the similarity of the music, but since the threshold of the music differs depending on the animation, it is currently difficult to create a playlist of anison that is not limited to a specific animation. It has become. I think it is related to the following problems, but if I come up with a good method, I would like to improve it.

Playlist generation is slow

Currently, the database has become just a temporary data storage location, and at the end it will be processed with a for statement. If I could write SQL smarter, I feel that the playlist creation time would be reduced, so I would like to rewrite the code related to the database in the future.

Summary

Creating anison playlists was too tight, so I created software that automatically creates anison playlists. There are still some problems, but I think that software with the minimum functions has been created. Looking back, the result was that ** the time spent coding far exceeded the time it took to create playlists steadily **, but I wish I could save people who have similar problems. thought. Last but not least, if you have any mistakes or improvements (especially database-related processing), I would appreciate it if you could comment. Thank you for reading until the end.

Change log

2020/03/04 released

Recommended Posts

A story that made it possible to automatically create anison playlists from your music files
A story that made it possible to automatically create anison playlists from your music files
I made a tool that makes it a little easier to create and install a public key.
I made a tool to create a word cloud from wikipedia
Let's create a program that automatically registers ID/PW from CSV to Bitwarden with Python + Selenium
I made a tool to automatically generate a simple ER diagram from the CREATE TABLE statement
A story that I was addicted to calling Lambda from AWS Lambda.
Create a tool to automatically furigana with html using Mecab from Python3
Make it possible to read .eml from your smartphone using discord bot
A story about writing a program that automatically summarizes your own asset transitions
A story that suffered from OS differences when trying to implement a dissertation
How to create a clone from Github
How to create a repository from media
I made a system that automatically decides whether to run tomorrow with Python and adds it to Google Calendar.
[Python] I made a script that automatically cuts and pastes files on a local PC to an external SSD.
[Django] Create a form that automatically fills in the address from the zip code
A story that I was addicted to when I made SFTP communication with python
I made a package to create an executable file from Hy source code
Edit Excel from Python to create a PivotTable
How to create a function object from a string
Automatically create Ansible configuration files (vars) from excel (Ansible)
A story that failed when trying to remove the suffix from the string with rstrip
I made a Docker Image that reads RSS and automatically tweets regularly and released it.
A simple system that automatically shoots with object detection and sends it to LINE