[Python] I created an app that automatically downloads the audio file of each word used for the English study app.

Synopsis

The English study app created in the article below requires an audio file of English words. https://qiita.com/Fuminori_Souma/private/0706716fdebf08572c6c

Downloading the audio file manually is time consuming and laborious, so I decided to download it automatically by web scraping.

Thank you for downloading the audio file from weblio.

source file

get_sound_file.py


import sys
import tkinter
import time
import re
import urllib.request
from tkinter import messagebox
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

class Frame(tkinter.Frame):

    def __init__(self, master=None):
        tkinter.Frame.__init__(self, master)
        self.master.title('Get the audio file')
        self.master.geometry("400x300")

        #Label settings
        text_1 = tkinter.Label(self, text=u'Enter the word for which you want to get an audio file in the text box below.')
        text_1.pack(pady='7')
        text_2 = tkinter.Label(self, text=u'* When entering multiple words, ",Please separate with.')
        text_2.pack()

        #Text (multiple vers of entries.)settings of
        self.ent_words = tkinter.Text(self, height=15)
        self.ent_words.pack(padx='30')

        #Push button settings
        bttn_start = tkinter.Button(self, text = u'start', command=self.start_get_file)
        bttn_start.bind("<Button-1>") #(Button-2 for wheel click, 3 for right click)
        bttn_start.pack(pady='7')

    def checkAlnum(self, word):  #Check if the entered word contains unnecessary symbols, etc.
        alnum = re.compile(r'^[a-zA-Z]+$')  #Compile regular expressions
        result = alnum.match(word) is not None  #SRE if match meets the conditions_Match object, otherwise None(False)return it
        return result

    def delete_symbols(self, word):  #Delete symbols etc. included in the character string
        # return word.replace(',', '').replace('.', '').replace('-', '').replace(' ', '')
        return word.replace(',', '').replace(' ', '')

    def get_mp3(self, word, driver):  #Open weblio page and get mp3 file

        dir = 'C:/Users/fumin/OneDrive'  #Audio file download destination

        #Enter a word in the text box for word search and press the search button
        driver.find_element_by_xpath("//*[@id=\"searchWord\"]").clear()  #Initialize text box
        driver.find_element_by_xpath("//*[@id=\"searchWord\"]").send_keys(word)
        driver.find_element_by_xpath("//*[@id=\"headFixBxTR\"]/input").click()
        time.sleep(5)

        #Audio file exists (=If "player play" exists)
        if not driver.find_elements_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i") == []:

            #Press "Play Player" to open the mp3 file in a new window
            driver.find_element_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i").click()
            time.sleep(5)

            #Change the target window to a newly opened mp3 file
            handles = driver.window_handles
            driver.switch_to.window(handles[1])

            #Download mp3 file
            urllib.request.urlretrieve(driver.current_url, (dir + '/' + word + '.mp3'))
            driver.close()

            #Return the target window to the original window
            driver.switch_to.window(handles[0])

            return 'OK'

        else:  #Audio file does not exist (=If "player playback" does not exist)

            return 'NG'


    def start_get_file(self):

        reslist = {}  #Whether the audio file of the word exists (initialized with an empty dictionary type)

        words = self.ent_words.get('1.0', 'end')  #Get the word list entered in the text box

        if self.checkAlnum(self.delete_symbols(words)):  #Entered correctly (alphabetic characters and "",If nothing other than "is entered)

            ww = [x.strip() for x in words.split(',')]  #Store the input word list as a list type separated by commas

            #Open browser
            drv = webdriver.Chrome("C:/Users/fumin/pybraries/chromedriver_ver79/chromedriver")
            time.sleep(10)

            #Open the page (weblio) to operate
            drv.get("https://ejje.weblio.jp/")
            time.sleep(10)

            j = 0  #NG word(Words for which mp3 files do not exist)Number of

            for i in range(len(ww)):  #Get mp3 file
                reslist[ww[i]] = self.get_mp3(ww[i], drv)

                if reslist[ww[i]] == 'NG':  #Add words that don't have mp3 files to the NG list

                    j += 1  #Add the number of NG words

                    if j <= 1:  #The first NG word is stored as a character string type
                        nglist = ww[i]

                    elif j == 2:  #The second NG word is converted to a list type by connecting it with the first one separated by commas.
                        nglist = (nglist + ',' + ww[i]).split(',')

                    else:  #The third and subsequent ones are added to the list type sequentially
                        nglist.append(ww[i])

            drv.close()  #Close the browser when the word acquisition process is complete

            if 'nglist' in locals():  #If there are words for which the audio file did not exist

                if j == 1:  #When there is only one NG word
                    messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + nglist)
                else:  #When there are two or more NG words
                    messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + ', '.join(nglist))
            else:
                messagebox.showinfo('', 'I downloaded the audio file of all the entered words.')

        else:   #Not entered correctly (alphabetic characters and "",If something other than "is entered)
            messagebox.showinfo('', 'Alphabet and ",Is entered. Please try again after deleting it.')


if __name__ == '__main__':

    #Frame settings
    root = Frame()
    root.pack()
    root.mainloop()

Remarks

It's not good to put a burden on weblio's site, so I slowed it down considerably. .. for that reason, The download speed is not much different from manual. (I think it is meaningful to automate, not speed)

Task

  1. When I open an mp3 file, the audio file is played every time. .. for that reason, Adjusted the sound of the mp3 file only when it is played so that the sound is not played. .. But of the mp3 file I couldn't adjust the volume bar. I thought about setting the volume of the PC itself to 0 for a moment, but while listening to music If you downloaded it, the music will be cut off too! I thought, and gave up without stopping.

  2. For how to download mp3 files, first right click-> Save Audio As I was thinking of selecting, but the context menu that came out by right-clicking is in Selenium It seems inaccessible. .. So I used urllib to download the mp3 file. I'm glad I was able to download the mp3 file as a result, but when I need to right-click in the future I'm in trouble. ..

Other information that was used as a reference

Thank you for all the help you have given me. Thank you very much.

Contents Link destination
How to download the file https://stackoverflow.com/questions/48736437/how-to-download-this-video-using-selenium
Confirmation of element existence https://ja.stackoverflow.com/questions/30895/xpath%E3%81%A7%E8%A6%81%E7%B4%A0%E3%81%AE%E5%AD%98%E5%9C%A8%E3%82%92%E7%A2%BA%E8%AA%8D%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95
About right-clicking on Selenium https://stackoverflow.com/questions/20316864/how-to-perform-right-click-using-selenium-chromedriver

Finally

I'm wrong here! No here! You should do this here! If you have any questions, If you can point it out, I will be happy to shed tears.

Recommended Posts

[Python] I created an app that automatically downloads the audio file of each word used for the English study app.
[Python] I made an app to practice the subtle voice distinction of English words.
Automatically resize screenshots for the App Store for each screen in Python
Create an English word app with python
I created a script to check if English is entered in the specified position of the JSON file in Python.
With LINEBot, I made an app that informs me of the "bus time"
I made a tool in Python that right-clicks an Excel file and divides it into files for each sheet.
Output the specified table of Oracle database in Python to Excel for each file
Python program that looks for the same file name
Parse the Researchmap API in Python and automatically create a Word file for the achievement list
Check the operation of Python for .NET in each environment
Consideration for Python decorators of the type that passes variables
Miscellaneous notes that I tried using python for the matter
Google search for the last line of the file in Python
I investigated the X-means method that automatically estimates the number of clusters
Create an app that notifies LINE of the weather every morning
python note: map -do the same for each element of the list
I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?)
[Fundamental Information Technology Engineer Examination] I wrote an algorithm for the maximum value of an array in Python.
[For beginners] I want to get the index of an element that satisfies a certain conditional expression
(Python) I made an app from Trello that periodically notifies slack of tasks that are about to expire.