Inflating text data by retranslation using google translate in Python

At the beginning

I was mentally tired and wanted to get approval easily, so I used it at the NLP competition SIGNATE Student Cup 2020 that I recently participated in. Introducing a text data inflating script in Python using Translate. There are already many similar articles, so it's not new at all.

Data set used

I couldn't find a handy one, but I decided to use this from kaggle's dataset. Wikipedia Movie Plots

script

For the time being, I will introduce a script that translates English sentences into Japanese and then translates them into English.

from googletrans import Translator

def retranslator(text, lang):
    '''After translating from English to another language, translate again to English and aim to inflate the data
    '''
    translator = Translator()
    translated = translator.translate(text, src='en', dest=lang).text
    retranslated = translator.translate(translated, src=lang, dest='en').text
    return translated, retranslated

Like this.

To explain it really simply, text is the string you want to translate, src is the language code of the original language, and dest is the language code of the translation destination.

For the language code of google translate, please refer to Language Support at the following URL and choose the one you like.

By the way, it is expected that the translation accuracy will be better in a relatively major language, so when using it for the purpose of inflating data, I think it is safer to choose a major language as it is. In fact, even in competitions, it seems that there are many cases where French, German, Spanish, Japanese, Chinese, etc. are selected and retranslated and inflated.

Actually use

Execution code

import pandas as pd
from googletrans import Translator

data = pd.read_csv('./wiki_movie_plots_deduped.csv')

def retranslator(text, lang):
    '''After translating from English to another language, translate again to English and aim to inflate the data
    '''
    translator = Translator()
    translated = translator.translate(text, src='en', dest=lang).text
    retranslated = translator.translate(translated, src=lang, dest='en').text
    return translated, retranslated

for i in range(5):
    row = data.iloc[i]

    translated, retranslated = retranslator(row['Plot'], 'ja')

    result = {
        'Original': row['Plot'],
        'translated': translated,
        'retranslated': retranslated
    }
    for key, val in result.items():
        print(key)
        print(val)
        print('')

output

Original A bartender is working at a saloon, serving drinks to customers. After he fills a stereotypically Irish man's bucket with beer, Carrie Nation and her followers burst inside. They assault the Irish man, pulling his hat over his eyes and then dumping the beer over his head. The group then begin wrecking the bar, smashing the fixtures, mirrors, and breaking the cash register. The bartender then sprays seltzer water in Nation's face before a group of policemen appear and order everybody to leave.[1]

translated A bartender works in the salon and serves drinks to customers. Carrie Nation and her followers jumped in after he filled a typical Irish bucket with beer. They attacked the Irish, pulled his hat over his eyes, and then dumped the beer over his head. After that, the group destroys the bars, the equipment, the mirrors, and the cashiers begin to break. The bartender then sprays Selzer water on Nation's face, and then a group of police officers appear and order everyone to leave. [1]

retranslated A bartender works at the salon and serves drinks to customers. Carry Nation and her followers plunge into him after he filled a typical Irish bucket with beer. They attacked the Irish, pulled his hat over his eyes, and then threw the beer over his head. After that, the group destroys the bar, destroys equipment, mirrors, and begins to destroy the cash register. The bartender then sprays Seltzer water on Nation's face, then a group of policemen appears and orders everyone to leave. [1]


Original The moon, painted with a smiling face hangs over a park at night. A young couple walking past a fence learn on a railing and look up. The moon smiles. They embrace, and the moon's smile gets bigger. They then sit down on a bench by a tree. The moon's view is blocked, causing him to frown. In the last scene, the man fans the woman with his hat because the moon has left the sky and is perched over her shoulder to see everything better.

translated The moon drawn with a smile hangs down in the park at night. A young couple walking over the fence learns about railings and looks up. The moon smiles. They hug and the moon smiles bigger. Then they sat on a bench by the tree. The view of the moon was obstructed and he frowned. In the final scene, the moon leaves the sky and everything is clearly visible over the shoulder, so the man wears a hat and incites the woman.

retranslated The moon drawn with a smile hangs in the park at night. A young couple walking over the fence learns about the handrail and looks up. The moon smiles. They hug and make the moon smile bigger. Then they sat on a bench by the tree. The moon's view was blocked and he frowned. In the last scene, the man leaves the sky and sees everything over his shoulder, so men wear hats to incite women.


Original The film, just over a minute long, is composed of two shots. In the first, a girl sits at the base of an altar or tomb, her face hidden from the camera. At the center of the altar, a viewing portal displays the portraits of three U.S. Presidents—Abraham Lincoln, James A. Garfield, and William McKinley—each victims of assassination. In the second shot, which runs just over eight seconds long, an assassin kneels feet of Lady Justice.

translated This movie is a little over a minute long and consists of two shots. Initially, the girl sits at the foot of an altar or tomb, with her face hidden from the camera. The viewing portal in the center of the altar shows portraits of the three victims of the assassination, Abraham Lincoln, James A. Garfield, and William McKinley. The second shot takes just over 8 seconds and kneels down on the goddess of justice.

retranslated This movie is a little over a minute and consists of two shots. Initially, the girl sits at the base of the altar or grave, with her face hidden from the camera. A viewing portal in the center of the altar shows portraits of three US presidents, Abraham Lincoln, James A. Garfield and William McKinley, who are victims of assassination. The second shot is just over 8 seconds and kneels on the feet of the goddess of justice.


Original Lasting just 61 seconds and consisting of two shots, the first shot is set in a wood during winter. The actor representing then vice-president Theodore Roosevelt enthusiastically hurries down a hillside towards a tree in the foreground. He falls once, but rights himself and cocks his rifle. Two other men, bearing signs reading "His Photographer" and "His Press Agent" respectively, follow him into the shot; the photographer sets up his camera. "Teddy" aims his rifle upward at the tree and fells what appears to be a common house cat, which he then proceeds to stab. "Teddy" holds his prize aloft, and the press agent takes notes. The second shot is taken in a slightly different part of the wood, on a path. "Teddy" rides the path on his horse towards the camera and out to the left of the shot, followed closely by the press agent and photographer, still dutifully holding their signs.

translated It consists of two shots in just 61 seconds, and during the winter, the first shot is taken in the woods. The actor representing Theodore Roosevelt, then Vice President, is enthusiastically rushing down the hillside towards the tree in the foreground. He collapses once, but gives himself rights and shoots his rifle. The other two men chase after him, labeled "his photographer" and "his reporter's agent," respectively. The photographer sets up the camera. "Teddy" points the rifle at a tree, defeats and pierces what looks like a normal domestic cat. "Teddy" holds his award high, and reporters take notes. The second shot is taken on a path in a slightly different part of the forest. "Teddy" heads his horse's path towards the camera and out to the left of the shot, followed closely by the press agent and photographer, still holding the sign faithfully.

retranslated Consisting of two shots of only 61 seconds, during the winter the first shot is taken in the woods. At the time, the actor, who represented Vice President Theodore Roosevelt, enthusiastically rushed down the hill toward the trees in front. He falls once, but empowers himself and shoots his rifle. Two other men chase him, marking them with "his photographer" and "his reporter agent" respectively. The cameraman sets up the camera. The "teddy" points its rifle at a tree, defeats and sticks what looks like a normal domestic cat. "Teddy" has raised his award high and reporters take notes. The second shot is taken on a path in a slightly different part of the forest. "Teddy" heads his horse towards the camera and out to the left of the shot, closely followed by the press agent and the photographer, still faithfully holding the autograph.


Original The earliest known adaptation of the classic fairytale, this films shows Jack trading his cow for the beans, his mother forcing him to drop them in the front yard, and beig forced upstairs. As he sleeps, Jack is visited by a fairy who shows him glimpses of what will await him when he ascends the bean stalk. In this version, Jack is the son of a deposed king. When Jack wakes up, he finds the beanstalk has grown and he climbs to the top where he enters the giant's home. The giant finds Jack, who narrowly escapes. The giant chases Jack down the bean stalk, but Jack is able to cut it down before the giant can get to safety. He falls and is killed as Jack celebrates. The fairy then reveals that Jack may return home as a prince.

translated The earliest known adaptation of classic fairy tales, this movie forces Jack to exchange his cows for beans, his mother forces him to drop them in the vestibule, and upstairs. Shows a forced bean. When he is asleep, Jack is visited by a fairy. The fairy gives a glimpse of what lies ahead as he climbs the bean stalk. In this version, Jack is the son of the abdicated king. When Jack wakes up, he discovers a bean tree growing and he climbs to the top of the giant's house. The giant finds Jack to escape slightly. The giant chases Jack on the bean stalk, but Jack can cut it off before the giant is safe. When Jack celebrates, he falls and is killed. The fairy reveals that Jack is going home as a prince.

retranslated The earliest known adaptation of the classic fairy tale, this film shows Jack exchanging his cows for beans, his mother forcing him to drop them in the front yard, and upstairs. Shows forced beag. When he is asleep, Jack is visited by fairies. The fairy gives a glimpse of what he is waiting for when he climbs the bean stalk. In this version, Jack is the son of the deposed King. When Jack wakes up, he finds a bean tree growing and he climbs to the top of the giant's house. The giant finds Jack who escapes slightly. The giant chases Jack for the bean stalk, but Jack can chop it off before the giant is safe. When Jack celebrates, he falls and is killed. The fairy reveals that Jack will return home as a prince.

The output result is not so easy to see, but I don't have much mental power to pay attention to such details, so please forgive me.

Did you know which movie plot from the translated text? If you are interested, please see the title by yourself except for the kaggle dataset.

Japanese translation? I feel that there are some parts that become, but the retranslated one is n

Now you can use the technique often used in NLP competitions to inflate data by expressing sentences with the same meaning in slightly different expressions. The drawback is that it depends on the quality of the translation, but I think this is a relatively easy and reasonably effective method, so please give it a try.

Bonus status

Recently (although it was about a week ago), I participated in SIGNATE Student Cup 2020. There, my mental strength was reduced. Click here for participation (style that does not forget to advertise) [SIGNATE Student Cup 2020 [Prediction Division] Participation (pop-ketle version)](https://pop-ketle.hatenablog.com/entry/2020/08/28/ 130451)

So, I'm writing while dividing into parts Let's make an app that can search similar images with Python and Flask Part2 has already been updated I want you to wait for a while. Actually, how should we develop the app next time, and should we properly research and write Flask's commentary? The current situation is that I don't have a lot of time to write an article because I'm worried about the next initiative and there are some other things I have to do. (I wrote this article for an hour because I wanted to get a feeling of doing my best easily.) Goodbye everyone for a while, please take good care of your mental strength.

Recommended Posts

Inflating text data by retranslation using google translate in Python
Translate using googletrans in Python
[Python3] Google translate google translate without using api
Get Google Fit API data in Python
Get Youtube data in Python using Youtube Data API
Creating Google Spreadsheet using Python / Google Data API
Put text scraped in Python into Google Sheets
Copy data between Google Keep accounts in Python
Clustering text in Python
Data analysis using Python 0
Data cleaning using Python
Text processing in Python
Create a data collection bot in Python using Selenium
Upload JPG file using Google Drive API in Python
[Python] Easy Google Translate app using Eel and Googletrans
Collectively register data in Firestore using csv file in Python
Get LEAD data using Marketo's REST API in Python
[Python] Get insight data using Google My Business API
Speech file recognition by Google Speech API v2 using Python
[Memo] Text matching in pandas data frame using flashtext
Handle Ambient data in Python
UTF8 text processing in python
Display UTM-30LX data in Python
Select features using text data
Output Excel data in separate writing using Python3 + xlrd + mecab
[Introduction] Artificial satellite data analysis using Python (Google Colab environment)
Speech to speech in python [text to speech]
Read English sentences by hitting Google Translate API with Python without using the distributed module
Get an English translation using python google translate selenium (memories)
Graph time series data in Python using pandas and matplotlib
Data analysis using python pandas
Using Python mode in Processing
Sort by date in python
Play with YouTube Data API v3 using Google API Python Client
A simple data analysis of Bitcoin provided by CoinMetrics in Python
Object extraction in images by pattern matching using OpenCV with Python
Regularly upload files to Google Drive using the Google Drive API in Python
[SEO] Flow / sample code when using Google Analytics API in Python
Get Leap Motion data in Python.
GUI programming in Python using Appjar
Precautions when using pit in Python
Data acquisition using python googlemap api
GOTO in Python with Sublime Text 3
Read Protocol Buffers data in Python3
Get data from Quandl in Python
Handle NetCDF format data in Python
Try using LevelDB in Python (plyvel)
Download Google Drive files in Python
Generating multilingual text images using Python
Forcibly use Google Translate from python
OS determination by Makefile using Python
Extract text from images in Python
Using global variables in python functions
Sort large text files in Python
Hashing data in R and Python
Let's see using input in python
Infinite product in Python (using functools)
Edit videos in Python using MoviePy
Reading and writing text in Python
Handwriting recognition using KNN in Python
Try using Leap Motion in Python