Introduction

It's hard to read English papers, isn't it? Let's have it translated, the outlook will be much better.

In the text, it is written as for PDF, but the point is that you can use it if you can copy the text you want to translate to the clipboard.

Method

The problem with translating PDFs is the difficulty of handling PDF files. Even if you rely on the library to extract characters automatically, it doesn't work, or the order of sentences is messed up. So this time I would like to translate via the clipboard.

The flow is

Open the PDF file with Chrome etc. and select all "Ctrl + A" to copy 　　　　↓ Run the program 　　　　↓ Break down sentences so that they do not exceed the character limit (5000 characters) and are separated by periods. 　　　　↓ Throw to a translation site 　　　　↓ Get results 　　　　↓ output

It's like that.

code

Requires Selenium and pyperclip to run. Please install.

pip install selenium
pip install pyperclip

Please put it in the same directory → ChromeDriver

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
import pyperclip as ppc

DRIVER_PATH = 'chromedriver.exe'

options = Options()
options.add_argument('--disable-gpu')
options.add_argument('--disable-extensions')
options.add_argument('--proxy-server="direct://"')
options.add_argument('--proxy-bypass-list=*')
options.add_argument('--start-maximized')


def parse_merge(text, n=4900):
    sentences = []
    sentence = ""
    for i in " ".join(text.splitlines()).split("."):
        if len(sentence) + len(i) > n:
            sentences.append(sentence)
            sentence = ""
        sentence += i + "."
    sentences.append(sentence)
    return sentences


def TranslateFromClipboard(tool, write, filename, isPrint):
    driver = webdriver.Chrome(executable_path=DRIVER_PATH,
                              chrome_options=options)
    url = 'https://www.deepl.com/ja/translator' if tool == "DeepL" else 'https://translate.google.co.jp/?hl=ja&tab=TT&authuser=0#view=home&op=translate&sl=auto&tl=ja'
    driver.get(url)
    transSentence = ""
    if tool == "DeepL":
        textarea = driver.find_element_by_css_selector(
            '.lmt__textarea.lmt__source_textarea.lmt__textarea_base_style')
    elif tool == "GT":
        textarea = driver.find_element_by_id('source')
    for sentence in parse_merge(ppc.paste()):
        cbText = ppc.paste()
        ppc.copy(sentence)
        textarea.send_keys(Keys.CONTROL, "v")
        ppc.copy(cbText)
        transtext = ""
        while transtext == "":
            if tool == "DeepL":
                transtext = driver.find_element_by_css_selector(
                    '.lmt__textarea.lmt__target_textarea.lmt__textarea_base_style'
                ).get_property("value")
            elif tool == "GT":
                try:
                    transtext = driver.find_element_by_css_selector(
                        '.tlid-translation.translation').text
                except:
                    pass
            time.sleep(1)
        if isPrint: print(transtext)
        transSentence += transtext
        textarea.send_keys(Keys.CONTROL, "a")
        textarea.send_keys(Keys.BACKSPACE)
    driver.quit()
    if write:
        with open(filename, "w", encoding='UTF-8') as f:
            for sentence in transSentence.split("。"):
                f.write(sentence + "。\n")


if __name__ == "__main__":
    args = ["DeepL", False, "translated_text.txt", True]
    if input("1. DeepL 2.GoogleTranslate　　") == "2": args[0] = "GT"
    if input("Do you want to write the translation result to a file? Y/n　　") == "y":
        args[1] = True
        filename = input(
            "Enter a name for the output file (default is'translated_text.txt'）　　")
        if filename:
            args[2] = filename
    if input("Would you like to see the translation progress here? Y/n　　") == "n":
        args[3] = False
    input("Press Enter when ready")
    TranslateFromClipboard(*args)

how to use

Save the above code as a Python file with a suitable name.
Copy the text of PDF (etc.).
Run the program.
Enter as prompted for various settings.
When you are ready (just have finished copying the English text by now), press Enter.
The browser will open and move freely, so wait while looking at it.
Read the translation result.

When outputting a text file, line breaks are made at the punctuation marks for the time being, so please rewrite as appropriate.

Summary

DeepL seems to be able to translate documents for a fee, but if possible, those who want to benefit for free, Originally, documents can be translated with Google Translate, but people who are desperate because there are places where they are not translated or the mathematical formulas are messed up. Try it, maybe it will make progress.

[Python] Let's automatically translate English PDF (but not limited to) with DeepL or Google Translate to make a text file.

8/6 postscript

Introduction

Method

code

how to use

Summary