Pulling songwriting, composition and arrangement information from the Tower Records site with Python

Introduction

This time I'm going to do a simple scraping. I don't think there are many people who like to collect sound sources locally during the heyday of subscription and even tag the lyrics, composition, and arrangement, but I would like to introduce it because it can be easily tagged.

let's try it

First is the structure of the Tower Records site. When I looked up the xpath, which is packed with important information, it looked like the following. //*[@id="RelationArtist_0_1_sub"]/div/div[3]/div[2]/a/text() This shows the lyrics information for the first song on Disc1. From the previous numbers, Disc Number, Trac Number, lyrics or composition or arrangement. Don't play with the last number. Let's actually write the code. The libraries used are lxml (scraping), urllib (around the net) and mutagen (music tag related).

tagget.py


om mutagen.flac import FLAC
from urllib import request
import requests
from lxml import html
import os
import requests
import json

class Net():
    def Tower(self, no, html2, disc, item):
        content = list()
        if item=="W": #Judge one of the lyrics, composition and arrangement, and enter the appropriate number.
            i = "3"
        elif item=="C":
            i = "4"
        elif item=="A":
            i = "5"
        contentr = html2.xpath('//*[@id="RelationArtist_'+str(disc)+'_'+str(no)+'_sub"]/div/div['+i+']/div[2]/a/text()') #Specify location

        try:
            content.append(contentr[0].strip('\'').strip()) #It's not smart, but it corresponds to the case where multiple values are entered
            content.append(contentr[1].strip('\'').strip()) #Let's use for or While!
            content.append(contentr[2].strip('\'').strip())
            content.append(contentr[3].strip('\'').strip())
        except IndexError:
            print(content) #If the value is no longer entered, an Error will be issued to output what kind of tag was acquired.
        return content

class Main():
    def Towerget(self,files,url):
        n = Net()
        r = requests.get(url) #Load the page
        html2 = html.fromstring(r.content) #Parse the page
        for f in files:
            tag = FLAC(f) #Loading tags
            no = tag['tracknumber'][0].lstrip("0") #I entered the 1-digit Disc Number as 0x, so I shaped it according to Tower Records.
            disc = int(tag['discnumber'][0].lstrip("0")) - 1 #The number representing the disc starts from 0, so adjust it.
            print(no)
            tag['word'] = n.Tower(no, html2, disc, item="W") #Lyrics tag input
            tag['composer'] = n.Tower(no, html2, disc, item='C') #Input composition tag
            tag['arranger'] = n.Tower(no, html2, disc, item="A") #Arrangement tag input
            tag.pprint() 
            tag.save() #Save tag

os.chdir("E:\music\Unorganized\Uchikubigokumon Club-Prison fifteen") #The file path of the file to tag
files0 = os.listdir(os.getcwd()) #Get a list of files in a folder
files = list()

for f in files0: #Since the same file contains Google Drive management files, jacket photos, etc., only flac is taken out.
    if f.endswith(".flac"):
        files.append(f)
        print(f)
    else:
        print("not "+f)

m = Main()
url = "https://tower.jp/item/4936516/Prison fifteen" #The URL of the Tower Records page
m.Towerget(files, url)

It's not a very clean code, but you can get it for the time being.

Improvement points

・ Songs such as Overture that do not have a song and no lyrics are out of sync. ・ Tower Records may not have entered the arrangement. ・ I want to get the URL of the Tower Records page automatically (this seems difficult).

Recommended Posts

Pulling songwriting, composition and arrangement information from the Tower Records site with Python
Collecting information from Twitter with Python (MySQL and Python work together)
Operate Firefox with Selenium from python and save the screen capture
Collecting information from Twitter with Python (Twitter API)
Scraping from an authenticated site with python
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
Visualize accelerometer information from the microcomputer board in real time with mbed + Python
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Collecting information from Twitter with Python (Environment construction)
[Note] Export the html of the site with python.
Solving the Lorenz 96 model with Julia and Python
Archive and compress the entire directory with python
Obtain location information (latitude and longitude) from the address. Geocode in Python ~ Geocoder and pydams ~
Extract images and tables from pdf with python to reduce the burden of reporting
Bulk download images from specific site URLs with python
Learn Nim with Python (from the beginning of the year).
Collecting information from Twitter with Python (morphological analysis with MeCab)
Visualize the range of interpolation and extrapolation with python
Information for controlling the motor with Python on RaspberryPi
Install the latest stable Python with pyenv (both 2 and 3)
Get mail from Gmail and label it with Python3
Install the latest Python from pyenv installed with homebrew
[Python] Get user information and article information with Qiita API
Extract the band information of raster data with python
Put Ubuntu in Raspi, put Docker on it, and control GPIO with python from the container
Try hitting the Twitter API quickly and easily with Python
I tried using the Python library from Ruby with PyCall
[Python] Read the csv file and display the figure with matplotlib
Solve the spiral book (algorithm and data structure) with python!
Streamline information gathering with the Twitter API and Slack bots
[Python] I installed the game from pip and played it
Hash with python and escape from a certain minister's egosa
[Python x Zapier] Get alert information and notify with Slack
Play with the password mechanism of GitHub Webhook and Python
Site monitoring and alert notification with AWS Lambda + Python + Slack
Python: Extract file information from shared drive with Google Drive API