** See a course that looks interesting on Udemy !! Wow! There are no Japanese subtitles ... ** ~~ Because studying English is troublesome ~~ ** Would you like to translate it into Japanese !!! **
I have some ability to read and write English, so it's not that I can't read it at all. However, since it cannot keep up with the native speed, it is necessary to stop the video one by one and translate it with one hand of the dictionary. However, I don't want to do such inefficiency because it is troublesome. So, let's translate the current subtitles into Japanese automatically.
Many Udemy videos have English subtitles, and you can press the button below at the bottom right of the video to see all the subtitles for that video. And the subtitles where the instructor is speaking are highlighted in light blue. In other words, if you can get this highlighted subtitle, you should be able to translate it.
When I searched for "Python scraping" on the net, I found that there was a module called Selenium, so I will use it.
Scraping with Selenium in Python (Basic)
Looking at the above article, it seems that you can get the element by id, class, name. I'm not sure because I haven't done much HTML and CSS, If you know the id and class of the subtitle you want for the time being, it seems that you can get it somehow.
I went to the appropriate course page on Udemy and took a look at the subtitles highlighted in the developer tools.
** Highlighted subtitles **
highlight.html
<span data-purpose="cue-text" class="transcript--highlight-cue--1bEgq">Highlight text</span>
** Subtitles without highlights **
nonhighlight.html
<span data-purpose="cue-text" class="">Non highlight text</span>
When I checked it while actually playing the video, the inside of the class changed between normal subtitles and highlighted subtitles.
Apparently the class of the highlighted element is transcript--highlight-cue--1bEgq
.
I actually got the subtitles with Selenium using the following code.
scraping.py
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome(driver_path)
driver.get(r'https://www.udemy.com/join/login-popup/?next=/home/my-courses/learning/')
last_text = None
while True:
try:
ret = driver.find_element_by_class_name('transcript--highlight-cue--1bEgq')
#0.Since the element is acquired every 2 seconds, print only when it is different from the previously acquired element.
if ret.text != last_text:
last_text = ret.text
print(last_text)
except NoSuchElementException:
#If the element is not found, an exception will occur, so squeeze it only at this time
pass
except Exception as e:
#In case of other exceptions, it ends for the time being
print(e)
print('Finish')
exit()
#0.2 seconds is appropriate
time.sleep(0.2)
It seems that I was able to get it safely, so for the time being, next.
Speaking of translation, ** Google teacher **. When I looked it up, it seems that Google Translate is possible with Python.
Translate using googletrans in Python
I added the source code referring to the above article.
translate.py
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from googletrans import Translator
driver = webdriver.Chrome(driver_path)
driver.get(r'https://www.udemy.com/join/login-popup/?next=/home/my-courses/learning/')
last_text = None
translator = Translator()
while True:
try:
ret = driver.find_element_by_class_name('transcript--highlight-cue--1bEgq')
#0.Since the element is acquired every 2 seconds, print only when it is different from the previously acquired element.
if ret.text is not None and ret.text != last_text:
last_text = ret.text
print(last_text)
print(translator.translate(last_text, dest='ja').text)
except NoSuchElementException:
#If the element is not found, an exception will occur, so squeeze it only at this time
pass
except Exception as e:
#In case of other exceptions, it ends for the time being
print(e)
print('Finish')
exit()
#0.2 seconds is appropriate
time.sleep(0.2)
For the time being, I was able to translate it into Japanese in real time, so ** Yoshi! ** You have to log in every time you run the script, sometimes it ends with a mysterious exception, and Japanese is messed up due to the wrong English subtitles in the first place, but it is less stressful than translating with one hand of the dictionary I can do it, so I will continue to use it.
Recommended Posts