[PYTHON] A script that converts between Django language files (PO) and tab-separated text (TSV)

A script that converts Django language files into easy-to-handle tab-delimited text. The reference source is a Stackoverflow thread. Is there any program to edit multiple po files? Change the CSV of the referrer code to TSV and add a script to update PO from TSV. Note that the conversion from TSV to PO overwrites only the msgid that exists in the PO file, so it may not be updated if the elements are not aligned in PO and TSV.

Python module installation

bash



pip install polib unicodecsv

Convert from PO to TSV

i18n_po2tsv.py



import sys
import unicodecsv as csv
from collections import OrderedDict
import polib

#
# Set default encoding
#
reload(sys)
sys.setdefaultencoding('utf8')

#
# Params
#
tsv_path = 'locale/localization.tsv'
po_paths = 'locale/{}/LC_MESSAGES/django.po'
langs = ['en', 'ja',]

#
# Convert po file to tsv
#
rows = OrderedDict()

print('')

for lang_code in langs:
    po_path = po_paths.format(lang_code)

    print('Reading po file... ' + po_path)

    po = polib.pofile(
        po_path,
        encoding='utf-8',
    )

    for entry in po:
        msgid = entry.msgid.encode('utf-8')
        msgstr = entry.msgstr.encode('utf-8')

        if msgid in rows.keys():
            rows[msgid].append(msgstr)
        else:
            rows[msgid] = [msgid, msgstr]

print('Saving tsv file... ' + tsv_path)
print('')

with open(tsv_path, 'wb') as f:
    writer = csv.writer(f, csv.excel_tab)
    writer.writerows(rows.values())

print('All done!')
print('')

Convert from TSV to PO

i18n_tsv2po.py



import sys
import unicodecsv as csv
import codecs
import polib

#
# Set system encoding
#
reload(sys)
sys.setdefaultencoding('utf8')

#
# Params
#
tsv_path = 'locale/localization.tsv'
po_paths = 'locale/{}/LC_MESSAGES/django.po'

#
# Read tsv file
#
langs = {
    'en': {},
    'ja': {},
}

print('')
print('Reading tsv file... ' + tsv_path)

with open(tsv_path, 'rb') as tsv_in:
    tsv_in = csv.reader(tsv_in, delimiter='\t')
    # next(tsv_in, None)
    for row in tsv_in:
        if row[1]:
            langs['en'][row[0]] = row[1].encode('utf-8')
        if row[2]:
            langs['ja'][row[0]] = row[2].encode('utf-8')

#
# Read and save po file
#
print('')

for key, value in langs.iteritems():
    lang_code = key
    po_path = po_paths.format(lang_code)
    po = polib.pofile(
        po_path,
        encoding='utf-8',
    )

    print('Reading po file... ' + po_path)

    lang = langs[lang_code]
    for entry in po:
        msgid = entry.msgid.encode('utf-8')
        if lang.has_key(msgid):
            msgstr = lang.get(msgid, '').encode('utf-8')
            entry.msgstr = msgstr
            
    print('Saving po file... ' + po_path)

    po.save()

    print('Done!')
    print('')

print('All done!')
print('')

Recommended Posts

A script that converts between Django language files (PO) and tab-separated text (TSV)
Script to convert between Xcode language files and tab-delimited text
A python script that deletes ._DS_Store and ._ * files created on Mac
Create a web app that converts PDF to text using Flask and PyPDF2
A Python program that converts ical data into text
A shell script that numbers duplicate names when creating files
A set of script files that do wordcloud in Python3
A python script that converts Oracle Database data to csv