[Python] Let's automatically translate English PDF (but not limited to) with DeepL or Google Translate to make a text file.

8/6 postscript

Improved to make the output result easier to see Continued [Python] Let's automatically translate English PDF (but not limited to) with DeepL or Google Translate to make a text file, no HTML.

Introduction

It's hard to read English papers, isn't it? Let's have it translated, the outlook will be much better.

Method

The problem with translating PDFs is the difficulty of handling PDF files. Even if you rely on the library to extract characters automatically, it doesn't work, or the order of sentences is messed up. So this time I would like to translate via the clipboard.

The flow is

Open the PDF file with Chrome etc. and select all "Ctrl + A" to copy     ↓ Run the program     ↓ Break down sentences so that they do not exceed the character limit (5000 characters) and are separated by periods.     ↓ Throw to a translation site     ↓ Get results     ↓ output

It's like that.

code

Requires Selenium and pyperclip to run. Please install.
pip install selenium
pip install pyperclip

Please put it in the same directory → ChromeDriver

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
import pyperclip as ppc

DRIVER_PATH = 'chromedriver.exe'

options = Options()
options.add_argument('--disable-gpu')
options.add_argument('--disable-extensions')
options.add_argument('--proxy-server="direct://"')
options.add_argument('--proxy-bypass-list=*')
options.add_argument('--start-maximized')


def parse_merge(text, n=4900):
    sentences = []
    sentence = ""
    for i in " ".join(text.splitlines()).split("."):
        if len(sentence) + len(i) > n:
            sentences.append(sentence)
            sentence = ""
        sentence += i + "."
    sentences.append(sentence)
    return sentences


def TranslateFromClipboard(tool, write, filename, isPrint):
    driver = webdriver.Chrome(executable_path=DRIVER_PATH,
                              chrome_options=options)
    url = 'https://www.deepl.com/ja/translator' if tool == "DeepL" else 'https://translate.google.co.jp/?hl=ja&tab=TT&authuser=0#view=home&op=translate&sl=auto&tl=ja'
    driver.get(url)
    transSentence = ""
    if tool == "DeepL":
        textarea = driver.find_element_by_css_selector(
            '.lmt__textarea.lmt__source_textarea.lmt__textarea_base_style')
    elif tool == "GT":
        textarea = driver.find_element_by_id('source')
    for sentence in parse_merge(ppc.paste()):
        cbText = ppc.paste()
        ppc.copy(sentence)
        textarea.send_keys(Keys.CONTROL, "v")
        ppc.copy(cbText)
        transtext = ""
        while transtext == "":
            if tool == "DeepL":
                transtext = driver.find_element_by_css_selector(
                    '.lmt__textarea.lmt__target_textarea.lmt__textarea_base_style'
                ).get_property("value")
            elif tool == "GT":
                try:
                    transtext = driver.find_element_by_css_selector(
                        '.tlid-translation.translation').text
                except:
                    pass
            time.sleep(1)
        if isPrint: print(transtext)
        transSentence += transtext
        textarea.send_keys(Keys.CONTROL, "a")
        textarea.send_keys(Keys.BACKSPACE)
    driver.quit()
    if write:
        with open(filename, "w", encoding='UTF-8') as f:
            for sentence in transSentence.split("。"):
                f.write(sentence + "。\n")


if __name__ == "__main__":
    args = ["DeepL", False, "translated_text.txt", True]
    if input("1. DeepL 2.GoogleTranslate  ") == "2": args[0] = "GT"
    if input("Do you want to write the translation result to a file? Y/n  ") == "y":
        args[1] = True
        filename = input(
            "Enter a name for the output file (default is'translated_text.txt')  ")
        if filename:
            args[2] = filename
    if input("Would you like to see the translation progress here? Y/n  ") == "n":
        args[3] = False
    input("Press Enter when ready")
    TranslateFromClipboard(*args)

how to use

  1. Save the above code as a Python file with a suitable name.
  2. Copy the text of PDF (etc.).
  3. Run the program.
  4. Enter as prompted for various settings.
  5. When you are ready (just have finished copying the English text by now), press Enter.
  6. The browser will open and move freely, so wait while looking at it.
  7. Read the translation result.

When outputting a text file, line breaks are made at the punctuation marks for the time being, so please rewrite as appropriate.

Summary

DeepL seems to be able to translate documents for a fee, but if possible, those who want to benefit for free, Originally, documents can be translated with Google Translate, but people who are desperate because there are places where they are not translated or the mathematical formulas are messed up. Try it, maybe it will make progress.

Recommended Posts

[Python] Let's automatically translate English PDF (but not limited to) with DeepL or Google Translate to make a text file.
Automatically translate DeepL into English with Python and Selenium
Experiment to make a self-catering PDF for Kindle with Python
Let's make a GUI with python.
Let's make a graph with python! !!
English speech recognition with python [speech to text]
Let's make a shiritori game with Python
Let's make a voice slowly with Python
Let's make a web framework with Python! (1)
Let's make a Twitter Bot with Python!
Let's make a web framework with Python! (2)
How to drop Google Docs in one folder in a .txt file with python
If you want to make a discord bot with python, let's use a framework
How to read a CSV file with Python 2/3
I want to make a game with Python
Try to make a "cryptanalysis" cipher with Python
Let's replace UWSC with Python (5) Let's make a Robot
Try to make a dihedral group with Python
I want to write to a file with Python
[Python] Automatically translate PDF with DeepL while keeping the original format. [Windows / Word required]
Try to make a command standby tool with python
Make a copy of a Google Drive file from Python
Convert a text file with hexadecimal values to a binary file
Let's make a simple game with Python 3 and iPhone
I tried to automatically generate a password with Python3
[Super easy] Let's make a LINE BOT with Python.
Let's create a program that automatically registers ID/PW from CSV to Bitwarden with Python + Selenium
Let's make a websocket client with Python. (Access token authentication)
Probably the easiest way to create a pdf with Python3
Post a message to Google Hangouts Chat with a thread (Python)
Make a fortune with Python
[5th] I tried to make a certain authenticator-like tool with python
Rubyist tried to make a simple API with Python + bottle + MySQL
[2nd] I tried to make a certain authenticator-like tool with python
Make it possible to output a log to a file with go echo
How to make a string into an array or an array into a string in Python
Make a cat detector with Google Colabratory (Part 2) [Python] ~ Use OpenCV ~
[3rd] I tried to make a certain authenticator-like tool with python
How to make a command to read the configuration file with pyramid
How to make a surveillance camera (Security Camera) with Opencv and Python
I want to do a full text search with elasticsearch + python
Let's stop copying. Introducing flati, a module to flatten with Python
I tried to make a 2channel post notification application with Python
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
Let's feel like a material researcher with python [Introduction to pymatgen]
I tried to make a todo application using bottle with python
[4th] I tried to make a certain authenticator-like tool with python
A special Python codec that seems to know but does not know
[1st] I tried to make a certain authenticator-like tool with python
Let's make a web chat using WebSocket with AWS serverless (Python)!
[ROS2] How to play a bag file with python format launch
Try adding a wall to your IFC file with IfcOpenShell python
[Python] When I tried to make a decompression tool with a zip file I just knew, I was addicted to sys.exit ()