For the time being, I got bibliographic information + abstract from Science Direct. Next, I would like to process this by pouring it into DeepL and translating it. If you sign up for a paid plan, you can translate the files at once, but well, it's a challenge to try it with selenimu and chromedriver.
import pandas as pd
df = pd.read_csv("DB.csv",header=None, delimiter=",", quoting=1)
print(df.at[0,1]) #title
print(df.at[0,9]) #Abst
print(df.at[0,10]) #keyword
for title in df[1]:
print(title)
Well, this is smooth.
from selenium import webdriver
load_url = "https://www.deepl.com/ja/translator"
driver = webdriver.Chrome(executable_path='c:/work/chromedriver.exe') # driver = webdriver.Chrome()
driver.get(load_url)
It goes smoothly here as well.
By sending the text and getting the translation, find the css selector of the textarea to input the English sentence and the textarea to output the translation from the developer screen of Chrome. So, the input is smooth,
#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea
It turned out that. So
title = df.at[0,1]
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(title)
I honestly sent English to DeepL.
On the other hand, the output fits better. For the time being, once I made a translation in an appropriate English sentence and output it, I searched and searched on the Developper screen using the output text, but it was not entered as it is in the textarea, but a button tag was attached. Is displayed. I tried to get the button element and take the text, but I couldn't get it for some reason. Even if you look at the state of the variable with the VSCode debugger, the text of the element is empty. e? why? ?? I can see it on Chrome, and on the Developper screen, the translation is displayed properly between the button tags, but why is it an empty string? ??
For the time being, as a workaround, DeepL has a convenient button that copies the translation to the clipboard, so I will click it with Selenium to copy it to the clipboard and get the translation from the clipboard.
So, install the package pyperclip
for handling the clipboard, find the CSS Selector of the button that copies to the clipboard, and execute it by writing the following script.
import pyperclip
title = df.at[0,1]
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(title)
time.sleep(5)
OutputCopyBtn = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__target_toolbar.lmt__target_toolbar--visible > div.lmt__target_toolbar__copy > button"
driver.find_element_by_css_selector(OutputCopyBtn).click()
print(pyperclip.paste())
Hmmm, I got it.
I feel like I've lost, so I'll retry if I can get the text directly from the button element.
When I googled, I came across @ riikunn1004's article.
I see. Can it be taken because it is called .getAttribute ("textContent ")
?
I can see it on the screen, and I'm not sure why I can't get it even though I can see it on the Developper screen, but I tried to make the following script with reference to this.
Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"
Outputtext = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
print(Outputtext)
Confirm that it can be taken with this. Huh.
I checked the structure of the DeepL page a little more. The output text area as well as the input text area
#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__inner_textarea_container > textarea
It is specified by. However, when I look at the CSS, : disabled
is attached and the operation is disabled.
Actually, the output area cannot be selected before inputting English text.
So, if you enter English, the class name in this area will change and disable will disappear. (The css selector does not change.)
But is there a translation in this area? That is not included. It looks like there is a translation on the screen, but it remains empty at this point.
If you click this area with the mouse and focus on the text area, the following elements will pop up and the translation will be displayed. (By the way, I entered test as English, so I don't translate it.)
<div class="lmt__textarea_base_style"
style="
position: absolute; transform: translate(-500%, -500%);
padding: 16px 32px 80px 24px; margin: 0px; overflow: hidden; font-family: "
Open Sans", sans-serif; font-size: 24px; font-stretch: 100%; font-weight: 400;
line-height: 36px; height: 468.5px; width: 448.5px;">
<span style="outline: green solid 1px;">Shiken</span>
<span style="outline: red solid 1px; display: inline-block; position: relative; height: 1em;"></span>
<span style="outline: blue solid 1px;"></span>
</div>
This guy's CSS Selector
#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__inner_textarea_container > div
Hmmm, I somehow understood. In other words, DeepL's page is displayed on the screen when English text is entered, but it is not displayed in the text area.
Actually, in the lower part, like this
<button class="lmt__translations_as_text__text_btn">Shiken</button>
I have it hidden in. So, if you move the cursor to the place that looks like the "text area" of the output on the screen and press the click, the div element like the one above will expand and the translation will be displayed. Does that mean that it is a button element to make this switch?
It's quite complicated. Why are you making it like this? ??
However, I see. With this kind of structure, you can't find a translation by looking around the text area unless you click once on the part that looks like a "text area".
Also, the button is basically disabled or invisible, so it can't be taken with .text
.
Furthermore, even if you get the CSS Selector of the element that appears when you focus on it in advance and try to access it after making a translation, unless you focus on the text area once, it will be played as "There is no such element" .. Just focus once and you'll find the element, so I think you can get it.
Really, why are you doing such a complicated thing? ??
For the time being, the part that is sent to DeepL to automatically translate English sentences is as follows. The while statement in the middle takes time to translate the English sentence, so it is because of the waiting time. Every second I go to check if the translation was successful and break if the translation is complete.
import pandas as pd
import time
from selenium import webdriver #
import chromedriver_binary
df = pd.read_csv("Reliablity EngineeringDB.csv",header=None, delimiter=",", quoting=1)
df.columns=["Authors", "Title", "jTitle", "VolIssue","Year", "Pages", "ISSN","DOI","URL","Abst","Keywords"]
print(df)
Title.df.at[0,1]
load_url = "https://www.deepl.com/ja/translator"
driver = webdriver.Chrome() # driver = webdriver.Chrome("c:/work/chromedriver.exe")
driver.get(load_url)
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
driver.find_element_by_css_selector(input_selector).send_keys(Title)
while 1:
Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"
Outputtext = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
if Outputtext != "" :
break
time.sleep(1)
print(Outputtext)
I intended to make it final in the above, but when I actually did what I wanted to do, I made it a function and messed with the standby processing part, so I will post that as well.
'''
Function to translate using DeepL
Input English you want to translate
Output translated Japanese
Exception When the input is not a character string
'''
import time
from selenium import webdriver
import chromedriver_binary
def TranslationByDeepL( mytext ):
if mytext =="":
return ""
if type(mytext) is not str:
raise Exception("Not a string")
#DeeL page URL and CSS Selector
load_url = "https://www.deepl.com/ja/translator"
input_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--source > div.lmt__textarea_container > div > textarea"
Output_selector = "#dl_translator > div.lmt__sides_container > div.lmt__side_container.lmt__side_container--target > div.lmt__textarea_container > div.lmt__translations_as_text > p > button.lmt__translations_as_text__text_btn"
'''
If the WebDriver process does not work, wait 1 second and try the WebDriver process again.
However, if you try 10 times and it doesn't work, an error is returned and the function processing ends.
Below, the same processing is performed where WebDriver is used
'''
errCount=0
f_succsess=False
while not f_succsess:
try: #Access DeepL
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options) # driver = webdriver.Chrome()
driver.get(load_url)
f_succsess = True
except Exception as identifier:
errCount=errCount+1
if errCount >=10:
raise identifier
#Send English to DeepL
errCount=0
f_succsess=False
while not f_succsess:
try: #Send English to DeepL
driver.find_element_by_css_selector(input_selector).send_keys(mytext)
f_succsess = True
except Exception as identifier:
errCount=errCount+1
if errCount >=10:
raise identifier
time.sleep(1)
#For flags
Output_before = ""
while 1:
errCount=0
f_succsess=False
while not f_succsess:
try:#Get DeepL output
Output = driver.find_element_by_css_selector(Output_selector).get_attribute("textContent")
f_succsess = True
except Exception as identifier:
errCount=errCount+1
if errCount >=10:
raise identifier
time.sleep(1)
'''
If the acquired output is an empty string, it means that the translation has not been completed yet, so check again after 1 second.
If the acquired output is not an empty string and the content is different from the previous output,
Recheck after 1 second because the translation is not finished yet.
If the acquired output is not an empty string, and if it has the same content as the previous output, it means that the translation is complete.
'''
if Output != "" : #If the output is not an empty string, the resulting output has started
if Output_before == Output:#If the output is the same as the previous output, it means that the output is complete.
break
Output_before = Output
time.sleep(1)
#Close chrome
driver.close()
#Result output
return Output
Recommended Posts