Automatically translate DeepL into English with Python and Selenium

Thing you want to do

For the time being, I got bibliographic information + abstract from Science Direct. Next, I would like to process this by pouring it into DeepL and translating it. If you sign up for a paid plan, you can translate the files at once, but well, it's a challenge to try it with selenimu and chromedriver.

Preparation

For the time being, load the CSV file into python.

import pandas as pd

df = pd.read_csv("DB.csv",header=None, delimiter=",", quoting=1)
print(df.at[0,1])  #title
print(df.at[0,9])  #Abst
print(df.at[0,10]) #keyword

for title in df[1]:
    print(title)

Well, this is smooth.

Access DeepL with Selenium and Chrom Driver

from selenium import webdriver  

load_url = "https://www.deepl.com/ja/translator"
driver = webdriver.Chrome(executable_path='c:/work/chromedriver.exe')  #  driver = webdriver.Chrome()
driver.get(load_url)

It goes smoothly here as well.

Send English to DeepL

By sending the text and getting the translation, find the css selector of the textarea to input the English sentence and the textarea to output the translation from the developer screen of Chrome. So, the input is smooth,

#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea

It turned out that. So

title = df.at[0,1]
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(title)

I honestly sent English to DeepL.

Acquisition of Japanese translation Part 1

On the other hand, the output fits better. For the time being, once I made a translation in an appropriate English sentence and output it, I searched and searched on the Developper screen using the output text, but it was not entered as it is in the textarea, but a button tag was attached. Is displayed. I tried to get the button element and take the text, but I couldn't get it for some reason. Even if you look at the state of the variable with the VSCode debugger, the text of the element is empty. e? why? ?? I can see it on Chrome, and on the Developper screen, the translation is displayed properly between the button tags, but why is it an empty string? ??

For the time being, as a workaround, DeepL has a convenient button that copies the translation to the clipboard, so I will click it with Selenium to copy it to the clipboard and get the translation from the clipboard. So, install the package pyperclip for handling the clipboard, find the CSS Selector of the button that copies to the clipboard, and execute it by writing the following script.

import pyperclip

title = df.at[0,1]
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(title)

time.sleep(5)

OutputCopyBtn = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__target_toolbar.lmt__target_toolbar--visible > div.lmt__target_toolbar__copy > button"
driver.find_element_by_css_selector(OutputCopyBtn).click()

print(pyperclip.paste())

Hmmm, I got it.

Acquisition of Japanese translation Part 2

I feel like I've lost, so I'll retry if I can get the text directly from the button element. When I googled, I came across @ riikunn1004's article. I see. Can it be taken because it is called .getAttribute ("textContent ")? I can see it on the screen, and I'm not sure why I can't get it even though I can see it on the Developper screen, but I tried to make the following script with reference to this.

Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"
Outputtext = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
print(Outputtext)

Confirm that it can be taken with this. Huh.

bonus

I checked the structure of the DeepL page a little more. The output text area as well as the input text area

#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__inner_textarea_container > textarea

It is specified by. However, when I look at the CSS, : disabled is attached and the operation is disabled. Actually, the output area cannot be selected before inputting English text. So, if you enter English, the class name in this area will change and disable will disappear. (The css selector does not change.)

But is there a translation in this area? That is not included. It looks like there is a translation on the screen, but it remains empty at this point.

If you click this area with the mouse and focus on the text area, the following elements will pop up and the translation will be displayed. (By the way, I entered test as English, so I don't translate it.)

<div class="lmt__textarea_base_style" 
  style="
    position: absolute; transform: translate(-500%, -500%); 
    padding: 16px 32px 80px 24px; margin: 0px; overflow: hidden; font-family: &quot;
    Open Sans&quot;, sans-serif; font-size: 24px; font-stretch: 100%; font-weight: 400;
    line-height: 36px; height: 468.5px; width: 448.5px;">
  <span style="outline: green solid 1px;">Shiken</span>
  <span style="outline: red solid 1px; display: inline-block; position: relative; height: 1em;"></span>
  <span style="outline: blue solid 1px;"></span>
</div>

This guy's CSS Selector

#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__inner_textarea_container > div

Hmmm, I somehow understood. In other words, DeepL's page is displayed on the screen when English text is entered, but it is not displayed in the text area.

Actually, in the lower part, like this

<button class="lmt__translations_as_text__text_btn">Shiken</button>

I have it hidden in. So, if you move the cursor to the place that looks like the "text area" of the output on the screen and press the click, the div element like the one above will expand and the translation will be displayed. Does that mean that it is a button element to make this switch?

It's quite complicated. Why are you making it like this? ?? However, I see. With this kind of structure, you can't find a translation by looking around the text area unless you click once on the part that looks like a "text area". Also, the button is basically disabled or invisible, so it can't be taken with .text. Furthermore, even if you get the CSS Selector of the element that appears when you focus on it in advance and try to access it after making a translation, unless you focus on the text area once, it will be played as "There is no such element" .. Just focus once and you'll find the element, so I think you can get it.

Really, why are you doing such a complicated thing? ??

~~ Final ~~ Source code

For the time being, the part that is sent to DeepL to automatically translate English sentences is as follows. The while statement in the middle takes time to translate the English sentence, so it is because of the waiting time. Every second I go to check if the translation was successful and break if the translation is complete.

import pandas as pd
import time
from selenium import webdriver  #  
import chromedriver_binary

df = pd.read_csv("Reliablity EngineeringDB.csv",header=None, delimiter=",", quoting=1)

df.columns=["Authors", "Title", "jTitle", "VolIssue","Year", "Pages", "ISSN","DOI","URL","Abst","Keywords"]
print(df)

Title.df.at[0,1]

load_url = "https://www.deepl.com/ja/translator"
driver = webdriver.Chrome()  #  driver = webdriver.Chrome("c:/work/chromedriver.exe")
driver.get(load_url)

input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(Title)
while 1:
    Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"
    Outputtext = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
    if Outputtext != "" :
        break
    time.sleep(1)
print(Outputtext)

Functionalization

I intended to make it final in the above, but when I actually did what I wanted to do, I made it a function and messed with the standby processing part, so I will post that as well.

'''
Function to translate using DeepL
Input English you want to translate
Output translated Japanese
Exception When the input is not a character string
'''
import time
from selenium import webdriver  
import chromedriver_binary

def TranslationByDeepL( mytext ):
    if mytext =="":
        return ""
    if type(mytext) is not str:
        raise   Exception("Not a string")

    #DeeL page URL and CSS Selector
    load_url = "https://www.deepl.com/ja/translator"
    input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
    Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"

    '''
If the WebDriver process does not work, wait 1 second and try the WebDriver process again.
However, if you try 10 times and it doesn't work, an error is returned and the function processing ends.
Below, the same processing is performed where WebDriver is used
    '''
    errCount=0
    f_succsess=False
    while not f_succsess:
        try: #Access DeepL
            options = Options()
            options.add_argument('--headless')
            driver = webdriver.Chrome(options=options)  #  driver = webdriver.Chrome()
            driver.get(load_url)
            f_succsess = True
        except Exception  as identifier:
            errCount=errCount+1
            if errCount >=10:
                raise identifier
    
    #Send English to DeepL
    errCount=0
    f_succsess=False
    while not f_succsess:
        try: #Send English to DeepL
            driver.find_element_by_css_selector(input_selector).send_keys(mytext)
            f_succsess = True
        except Exception  as identifier:              
            errCount=errCount+1
            if errCount >=10:
                raise identifier
            time.sleep(1)

    #For flags
    Output_before = ""
    while 1:
        errCount=0
        f_succsess=False
        while not f_succsess:
            try:#Get DeepL output
                Output = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
                f_succsess = True
            except Exception  as identifier:               
                errCount=errCount+1
                if errCount >=10:
                    raise identifier
                time.sleep(1) 
        '''
If the acquired output is an empty string, it means that the translation has not been completed yet, so check again after 1 second.
If the acquired output is not an empty string and the content is different from the previous output,
Recheck after 1 second because the translation is not finished yet.
If the acquired output is not an empty string, and if it has the same content as the previous output, it means that the translation is complete.
        '''        
        if Output != "" : #If the output is not an empty string, the resulting output has started
            if Output_before == Output:#If the output is the same as the previous output, it means that the output is complete.
                break
            Output_before = Output            
        time.sleep(1)

    #Close chrome
    driver.close()
 
    #Result output
    return Output

Recommended Posts

Automatically translate DeepL into English with Python and Selenium
Scraping with Python, Selenium and Chromedriver
Execute Google Translate and DeepL Translate with GUI
Practice web scraping with Python and Selenium
Try running Google Chrome with Python and Selenium
Automatically format Python code into PEP8-compliant code with Emacs
Drag and drop local files with Selenium (Python)
ScreenShot with Selenium (Python)
Scraping with Selenium [Python]
Automatically paste images into PowerPoint materials with python + α
Install selenium on Mac and try it with python
Automatic follow on Twitter with python and selenium! (RPA)
Automate Chrome with Python and Selenium on your Chromebook
[Python] Automatically translate PDF with DeepL while keeping the original format. [Windows / Word required]
Programming with Python and Tkinter
Encryption and decryption with Python
Scraping with selenium in Python
Python and hardware-Using RS232C with Python-
Scraping with Selenium + Python Part 1
Python: Working with Firefox with selenium
Scraping with Selenium in Python
python with pyenv and venv
Challenge Python3 and Selenium Webdriver
Works with Python and R
I want to automatically attend online classes with Python + Selenium!
Get an English translation using python google translate selenium (memories)
I tried to translate English subtitles into Japanese with Udemy
How to log in to AtCoder with Python and submit automatically
Communicate with FX-5204PS with Python and PyUSB
Shining life with Python and OpenCV
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python
[Python] font family and font with matplotlib
Scraping with Node, Ruby and Python
Scraping with Selenium in Python (Basic)
How to import CSV and TSV files into SQLite with Python
Scraping with Python and Beautiful Soup
JSON encoding and decoding with python
Hadoop introduction and MapReduce with Python
Reading and writing NetCDF with Python
Automatically build Python documentation with Sphinx
How to automatically install Chrome Driver for Chrome version with Python + Selenium + Chrome
Reading and writing CSV with Python
Selenium and python to open google
Multiple integrals with Python and Sympy
Coexistence of Python2 and 3 with CircleCI (1.0)
Easy modeling with Blender and Python
Sugoroku game and addition game with python
FM modulation and demodulation with Python
Format the text of an English dissertation with a single shortcut key and plunge into DeepL translation
Communicate between Elixir and Python with gRPC
Calculate and display standard weight with python
Get html from element with Python selenium
Monitor Mojo outages with Python and Skype
INSERT into MySQL with Python [For beginners]
English speech recognition with python [speech to text]
Automatically create Python API documentation with Sphinx
[Python] Automatically and seamlessly combine cropped images
Use DeepL with python (for dissertation translation)