[PYTHON] I tried to translate English subtitles into Japanese with Udemy

Motivation

** See a course that looks interesting on Udemy !! Wow! There are no Japanese subtitles ... ** ~~ Because studying English is troublesome ~~ ** Would you like to translate it into Japanese !!! **

Introduction

I have some ability to read and write English, so it's not that I can't read it at all. However, since it cannot keep up with the native speed, it is necessary to stop the video one by one and translate it with one hand of the dictionary. However, I don't want to do such inefficiency because it is troublesome. So, let's translate the current subtitles into Japanese automatically.

Get English subtitles

Many Udemy videos have English subtitles, and you can press the button below at the bottom right of the video to see all the subtitles for that video. And the subtitles where the instructor is speaking are highlighted in light blue. image.png In other words, if you can get this highlighted subtitle, you should be able to translate it.

Scraping

When I searched for "Python scraping" on the net, I found that there was a module called Selenium, so I will use it.

Scraping with Selenium in Python (Basic)

Looking at the above article, it seems that you can get the element by id, class, name. I'm not sure because I haven't done much HTML and CSS, If you know the id and class of the subtitle you want for the time being, it seems that you can get it somehow.

About the elements you want

I went to the appropriate course page on Udemy and took a look at the subtitles highlighted in the developer tools.

** Highlighted subtitles **

highlight.html


<span data-purpose="cue-text" class="transcript--highlight-cue--1bEgq">Highlight text</span>

** Subtitles without highlights **

nonhighlight.html


<span data-purpose="cue-text" class="">Non highlight text</span>

When I checked it while actually playing the video, the inside of the class changed between normal subtitles and highlighted subtitles. Apparently the class of the highlighted element is transcript--highlight-cue--1bEgq.

Source code for subtitle acquisition

I actually got the subtitles with Selenium using the following code.

scraping.py


import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome(driver_path)
driver.get(r'https://www.udemy.com/join/login-popup/?next=/home/my-courses/learning/')
last_text = None

while True:
    try:
        ret = driver.find_element_by_class_name('transcript--highlight-cue--1bEgq')
        #0.Since the element is acquired every 2 seconds, print only when it is different from the previously acquired element.
        if ret.text != last_text:
            last_text = ret.text
            print(last_text)
    except NoSuchElementException:
        #If the element is not found, an exception will occur, so squeeze it only at this time
        pass
    except Exception as e:
        #In case of other exceptions, it ends for the time being
        print(e)
        print('Finish')
        exit()

    #0.2 seconds is appropriate
    time.sleep(0.2)

It seems that I was able to get it safely, so for the time being, next.

Translation to Japanese

Speaking of translation, ** Google teacher **. When I looked it up, it seems that Google Translate is possible with Python.

Translate using googletrans in Python

I added the source code referring to the above article.

Final source code

translate.py


import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from googletrans import Translator

driver = webdriver.Chrome(driver_path)
driver.get(r'https://www.udemy.com/join/login-popup/?next=/home/my-courses/learning/')
last_text = None

translator = Translator()

while True:
    try:
        ret = driver.find_element_by_class_name('transcript--highlight-cue--1bEgq')
        #0.Since the element is acquired every 2 seconds, print only when it is different from the previously acquired element.
        if ret.text is not None and ret.text != last_text:
            last_text = ret.text
            print(last_text)
            print(translator.translate(last_text, dest='ja').text)
    except NoSuchElementException:
        #If the element is not found, an exception will occur, so squeeze it only at this time
        pass
    except Exception as e:
        #In case of other exceptions, it ends for the time being
        print(e)
        print('Finish')
        exit()

    #0.2 seconds is appropriate
    time.sleep(0.2)

Finally

For the time being, I was able to translate it into Japanese in real time, so ** Yoshi! ** You have to log in every time you run the script, sometimes it ends with a mysterious exception, and Japanese is messed up due to the wrong English subtitles in the first place, but it is less stressful than translating with one hand of the dictionary I can do it, so I will continue to use it.

Recommended Posts

I tried to translate English subtitles into Japanese with Udemy
I tried various methods to send Japanese mail with Python
I tried to divide the file into folders with Python
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried to predict next year with AI
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
[Python] Memo to translate Matplotlib into Japanese [Windows]
I tried to output LLVM IR with Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to debug.
I tried to paste
I tried to learn the sin function with chainer
I tried to create a table only with Django
I tried to extract features with SIFT of OpenCV
I tried to move Faster R-CNN quickly with pytorch
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to build ML Pipeline with Cloud Composer
I tried to implement time series prediction with GBDT
I tried to uncover our darkness with Chatwork API
I tried to put pytest into the actual battle
I tried to automatically generate a password with Python3
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to implement Grad-CAM with keras and tensorflow
I tried to interpolate Mask R-CNN with Optical Flow
Automatically translate DeepL into English with Python and Selenium
I tried to step through Bayesian optimization. (With examples)
I tried to find an alternating series with tensorflow
[Introduction to AWS] I tried playing with voice-text conversion ♪
I tried to solve AOJ's number theory with Python
I tried to implement a volume moving average with Quantx
I tried to predict and submit Titanic survivors with Kaggle
I tried to find the entropy of the image with python
I tried fp-growth with python
I tried scraping with Python
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️