This article is the 24th day of Shinagawa Advent Calendar 2019.
Recently, I am taking a Coursera course with the assistance of the company. Most of the courses are in English, but some volunteers have Japanese subtitles. However, some courses may not have subtitles even if the course says "with Japanese subtitles". For example, Deep Learning Specialization has almost no Japanese subtitles in the 2nd, 3rd, and 5th weeks. So I made Japanese subtitles with GCP Cloud Translate API.
I put the set in gist. This is a commentary article. I haven't done anything to show it to people, so please forgive me for the file name.
Coursera is an online course service. Learning includes not only videos but also programming practices and homework using Jupyter Notebook. You can discuss with other students and TAs in the forum. I will leave the review of each course to other articles, but I felt that it was a much more efficient learning method than doing research on my own.
WebVTT is a text format for subtitling videos within the <video>
element of HTML. It has a simple format and has the following structure [^ 1].
WEBVTT
1
00:00:00.000 --> 00:00:01.755
Hello, I'm Carolyn.
2
00:00:01.755 --> 00:00:03.795
I'd like to welcome
you to our course
3
00:00:03.795 --> 00:00:06.505
on Machine Learning for
Business Professionals.
As you can see, it is a pair of the playback time for displaying the subtitles and the content of the subtitles. The detailed format is described in MDN WebVTT Documents. You can also use decorative tags, but Coursera seems to have a simple text format as described above.
Nowadays, many media players such as VLC Media Player also support WebVTT subtitles. For example, VLC will automatically read a .vtt
file with the same file name as the video if it is in the same folder. The same applies to the Android version of the app.
Unlike the Google Translate web service, there is a charge [^ 2]. The Introduction Page has two APIs, AutoML Transration and Translation API, the latter of which.
Billing is done in units of one character, and as a guideline, it feels like "20,000 characters for a 30-minute video costs $ 0.8". Another analogy is that if you translate all videos without Japanese subtitles for 1 to 5 weeks of Deep Learning Specialization, it will cost about 1,500 yen. [^ 3].
I will explain the WebVTT translation script from here. The general processing flow is as follows.
For example, suppose you have the following subtitles [^ 1].
2
00:00:01.755 --> 00:00:03.795
I'd like to welcome
you to our course
3
00:00:03.795 --> 00:00:06.505
on Machine Learning for
Business Professionals.
First, put them together in one sentence up to the period. Remember that the number of characters for ID 2
is 37
and the number of characters for ID 3
is 47
.
Then translate the sentence.
Before translation: I'd like to welcome you to our course on Machine Learning for Business Professionals.
After translation:Welcome to the Machine Learning course for business professionals.
Now break the translation into the original two subtitles.
The character ratio of the original subtitles is 37:47
, so we will separate based on this ratio. However, it is difficult to read if it is cut in the middle of a word or before a particle such as "no" or "o", so do not cut it there.
The result is as follows.
00:00:01.755 --> 00:00:03.795
Machine learning for business professionals
00:00:03.795 --> 00:00:06.505
Welcome to the course.
If you translate normally, this is the length that will be displayed with one subtitle, so it feels a little strange. I think there is a lot of room for improvement.
Another sentence that follows the above is:
4
00:00:06.505 --> 00:00:08.160
I lead a team of machine learning
5
00:00:08.160 --> 00:00:10.080
engineers who have successfully
6
00:00:10.080 --> 00:00:12.450
implemented many
machine learning projects
7
00:00:12.450 --> 00:00:14.475
across various industries.
00:00:06.505 --> 00:00:08.160
I am in various industries
00:00:08.160 --> 00:00:10.080
Many machine learning projects
00:00:10.080 --> 00:00:12.450
A team of machine learning engineers who successfully implemented
00:00:12.450 --> 00:00:14.475
I'm leading.
Aside from machine-translated sentences, I think the subtitles are separated by good salt plums.
-webvtt-py: Used to parse and build WebVTT. It seems that it does not support ID and it disappears from the translated file, but there is no problem because the ID is not required to display the subtitles. -MeCab: Used to distinguish words and part of speech. -google-cloud-translate: A Python library that makes it easy to use the Cloud Translation API. Authentication is performed by setting the API token file in the environment variable.
--Sometimes the breaks are not good —— Due to various problems, subtitle division may become unbalanced. ――Cannot separate 3 or more consecutive particles and punctuation marks well ――I'm just neglecting to build logic, but in rare cases there is a pattern where the beginning of a line becomes a punctuation mark. --Cannot be applied to sentences without a period --In the 4th week of Deep Learning Specialization, Parameters vs Hyperparameters was the case, but there is a translation without a period. As expected, it was impossible with this logic. --Translation of technical words does not work ――Google Translate translates AI jargon well, but it can still be translated directly. ――You can use the API function to "create and register a dictionary", but I'm wondering if it's okay because it can be converted in the brain without much effort.
Even if I am not good at listening to English, I feel that with the support of subtitles, even machine translation can be quite helpful in understanding. Coursera has many great courses. I think it's much more efficient than reading an article or book to get systematic knowledge, so why not give it a try?
[^ 1]: Quoted from the Introduction in the first week of Machine Learning for Business Professional. This course is free if you do not need a certificate. [^ 2]: There is also Try to use Google Translate API for free. I haven't tried it. [^ 3]: Since I was making a script, I included the amount used in trial and error.
Recommended Posts