[PYTHON] Create a translation tool with the Translate Toolkit

Translate Toolkit The Translate Toolkit (http://toolkit.translatehouse.org/) is a set of libraries and command line tools to help you with your translation work. It is distributed under the GPLv2 license and can be used freely by anyone. The developer is Translate House, which is implemented in Python. It supports two Python systems, and will not be supported before Python 2.5 from the next 1.11.0. Translate House is also developing the translation platform Pootle, which also uses the Translate Toolkit (Doc-ja Advent Calendar 2013 Day 1. / naruoga / 20131201/1385838237), Day 3 The translation of LibreOffice introduced in (http://d.hatena.ne.jp/naruoga/20131203/1386063346) is also done on Pootle).

In this article, I'll use the Translate Toolkit library to create a simple translation tool. In particular, we deal with tools that automate the editing of PO files. This is useful when the editing process is simple but the target area is huge. Speaking of mechanical editing, existing Unix-like tools such as sed are originally good at it, but PO is not line-oriented, so it is not compatible with Unix-like tools and it is not straightforward. Now let's use the Translate Toolkit.

In this article, PO by GNU gettext is assumed as the message catalog format, but Translate Toolkit is not limited to PO. It supports various formats such as XLIFF. The PO itself is explained in the article Day 2 and will not be mentioned here.

Installation method

$ pip install translate-toolkit

Alternatively, if your OS provides the translate-toolkit package, you may want to use it. The source code is maintained on github.

Case study

For example, consider the following work example. "Three periods added as suffixes to the original message (common in GUI parts, etc.) were replaced with a three-point reader, which caused a large amount of fuzzy. The translation itself is simple, but the target area is huge. I want to process it mechanically. "

PO example

The following is an excerpt of the PO to be edited. Imagine that there are so many examples like this that you don't want to edit them by hand. Some artificial descriptions are also included to make the effect easier to understand.

#Update target
#, fuzzy
#| msgid "Open..."
msgid "Open…"
msgstr "open..."

#Update target
#There is a line break in the middle of the translation side period...
#, fuzzy
#| msgid "Save As..."
msgid "Save As…"
msgstr "save as."
".."

#For some reason, he has been a leader for some time. No update required
msgid "Print…"
msgstr "printing…"

#For some reason, I want to leave the translation without the suffix. Drop only fuzzy
#, fuzzy
#| msgid "Print Preview..."
msgid "Print Preview…"
msgstr "Print preview"

#For some reason the original text remains a period(For the time being, the translation follows it)
msgid "Preference..."
msgstr "Setting..."

Thing you want to do

The translator also replaces the three periods with a three-point reader to match the updated source text. The conditions of the target message are as follows.

Sample script

This script reads PO from standard input and writes the edits to standard output.

# -*- coding: utf-8 -*-
import sys
from translate.storage.po import pofile

PREV_SUFFIX = '...'
NEW_SUFFIX = u'…'

for unit in pofile.parsefile(sys.stdin).units:
    prev_source = unit.prev_source
    new_source = unit.source
    if unit.isfuzzy() and \
       prev_source.endswith(PREV_SUFFIX) and \
       new_source.endswith(NEW_SUFFIX) and \
       prev_source.rstrip(PREV_SUFFIX) == new_source.rstrip(NEW_SUFFIX):
        if unit.target.endswith(PREV_SUFFIX):
            updated_target = unit.target.replace(PREV_SUFFIX, NEW_SUFFIX)
            unit.settarget(updated_target)
        unit.markfuzzy(False)
        unit.prev_source = None
    print unit

Processing result

The processing result of the above PO is as follows. Can you see that fuzzy processing is done without excess or deficiency, and that line wrapping in the middle of the text can be handled appropriately?

#Update target
msgid "Open…"
msgstr "open…"

#Update target
#There is a line break in the middle of the translation side period...
msgid "Save As…"
msgstr "save as…"

#For some reason, he has been a leader for some time. No update required
msgid "Print…"
msgstr "printing…"

#For some reason, I want to leave the translation without the suffix. Drop only fuzzy
msgid "Print Preview…"
msgstr "Print preview"

#For some reason the original text remains a period(For the time being, the translation follows it)
msgid "Preference..."
msgstr "Setting..."

Script description

Overall, the words source and target often appear, but these are terms that are often used in the translation community. Source text and Target text mean the original and translated texts, respectively. Corresponds to msgid and msgstr in PO.

PO-related APIs will be available by importing the translate.storage.po module. For more information on the API, see the API Reference (http://docs.translatehouse.org/projects/translate-toolkit/en/1.11.0-rc1/api/index.html).

If you parse the PO with the parsefile method of the pofile class, a series of messages will be in the list called units. Each message that is an element of the list can be operated as an object of pounit class. pounit provides API for message manipulation, and not only msgid and msgid, but also previous msgid, various comments, and plural forms attributes defined by gettext You can access /manual/gettext.html#PO-Files). Through the pounit API, you can manipulate messages without being bound by some annoying physical description of PO files.

The 4 lines from the first if correspond to the above 4 conditions. Editing a message that matches the conditions.

At the end of the for loop, print prints each pounit to standard output. pounit defines the str method, and you can output it as a PO message in a valid format just by printing it as it is.

Other

As a similar work, it can be applied to various purposes such as mechanical check of typo and batch correction due to translation style change. In other common cases, you can easily create tools such as taking the update difference between two POs and merging the differences into one.

In fact, I helped with the GNOME Project Japanese translation (https://l10n.gnome.org/teams/ja/) to change the style of long vowels (such as changing "user" to "user"). Even when I did, I created an editing tool with Translate Toolkit and processed it all at once. The number of PO messages in the GNOME project is roughly estimated to be more than 100,000 in terms of msgid. It wasn't something that could be edited manually, and the Translate Toolkit really helped.

Summary

This time, I've introduced only a small part of the Translate Toolkit library. There are many other useful features. In addition to the library, there are also out-of-the-box command line tools. For more information, see the API Reference and other documentation (http://docs.translatehouse.org/projects/translate-toolkit/en/1.11.0-rc1/). Try using the Translate Toolkit for your daily translation life. Happy translating!

Tomorrow is knok.

Recommended Posts

Create a translation tool with the Translate Toolkit
How to create a submenu with the [Blender] plugin
Create a Todo app with the Django REST framework
Create a tweet heatmap with the Google Maps API
Create a homepage with django
Create a heatmap with pyqtgraph
Create a directory with python
Create a new csv with pandas based on the local csv
Measure the importance of features with a random forest tool
Probably the easiest way to create a pdf with Python3
Create a Twitter BOT with the GoogleAppEngine SDK for Python
Create a message corresponding to localization with python translation string
Create a virtual environment with Python!
Create a poisson stepper with numpy.random
Create a file uploader with Django
Create a color picker for the color wheel with Python + Qt (PySide)
Prepare a distributed load test environment with the Python load test tool Locust
Translate Coursera's WebVTT format subtitles with the GCP Cloud Translation API
Create a REST API to operate dynamodb with the Django REST Framework
Create a compatibility judgment program with the random module of python.
Tweet the weather forecast with a bot
Create a dummy image with Python + PIL.
Create a tool to automatically furigana with html using Mecab from Python3
Create a graph using the Sympy module
Create a GUI app with Python's Tkinter
Search the maze with the python A * algorithm
Create a large text file with shellscript
Create a star system with Blender 2.80 script
Create a virtual environment with Python_Mac version
Get started with the documentation tool Sphinx
Create a VM with a YAML file (KVM)
Create a simple web app with flask
Create a word frequency counter with Python 3.4
Create a Connecting Nearest Neighbor with NetworkX
Create a record with attachments in KINTONE using the Python requests module
Create a web service with Docker + Flask
Create a private repository with AWS CodeArtifact
Create a car meter with raspberry pi
Create a devilish picture with Blender scripts
Create a matrix with PythonGUI (text box)
Create a graph with borders removed with matplotlib
Create a simple reception system with the Python serverless framework Chalice and Twilio
Zip-compress any file with the [shell] command to create a file and delete the original file.
I tried to create a model with the sample of Amazon SageMaker Autopilot
Create a clean DB for testing with FastAPI and unittest the API with pytest
A model that identifies the guitar with fast.ai
Create a frame with transparent background with tkinter [Python]
[Python] Get the files in a folder with Python
Tweet the weather forecast with a bot Part 2
Create a GUI executable file created with tkinter
Find a position above the threshold with NumPy
Create a LINE BOT with Minette for Python
Create a game UI from scratch with pygame2!
Create a PDF file with a random page size
Create a virtual environment with conda in Python
Create a page that loads infinitely with python
[Note] Create a one-line timezone class with python
You can easily create a GUI with Python
Follow the AR marker with a 2-axis servo
Create a bulletin board with Heroku, Flask, SQLAlchemy
Create a dashboard for Network devices with Django!