[PYTHON] Until the person who touches the terminal for the first time automatically translates the dissertation into English with "Honjac Konjac".

background

"Honjo Konjac" is a translation program for English dissertations created by cabernet_rock. If you throw the URL of the paper, it will output a PDF file in which the original text and its Japanese translation are parallel **, including the figures **. The following article summarizes how it works and how to install it in an easy-to-understand manner. However, there are some points that I have installed and I am a little clogged up, and there are a few hurdles for those who are not accustomed to operating the wet main terminal, so I will describe the installation method for those who open the terminal for the first time. Eat "Honjo Konjac" and read the dissertation Let's use "real translation konjac".

environment

PC: MacBook Pro 2020 OS: macOS Catalina (10.15.5) shell: zsh (5.7.1)

Method

python installation

Since the version of python installed from the beginning on mac is old (2 series), install the latest version (3 series) (Reference: [Python3 installation (Mac version)](Python3 installation (Mac version))) ..

First, download the pkg file from the Official Site. As of September 4, 2020, 3.8.5 was the latest version. Double-click the downloaded pkg file to install it. Basically, there is no problem with "continue". Download_Python___Python_org.png

Open a terminal to see if it was installed. The terminal is in Applications → Utilities. ユーティリティ.png When the terminal starts, type "python3". If the message "Python 3.8.5" (the value here varies depending on the version) is displayed, the installation is successful. tt_—Python—_80×24.png

Install wkhtmltopdf

wkhtmltopd is a program for creating PDF from HTML. Please download the one that matches your PC from the Official Download Site. wkhtmltopdf.png Double-click the downloaded pkg file to install it. Basically, there is no problem with "continue". After installation, open a terminal and enter the following command. If "google.pdf" is created, the installation is successful.

#Same content as the author's article
wkhtmltopdf http://google.com google.pdf

tt_—-zsh—_80×24.png

Install Chrome Driver

ChromeDriver is a google chrome WebDriver. WebDriver is software required to operate the browser programmatically, and it seems that it is necessary when outputting the translation result in html in Konjac (I do not understand the mechanism around here). I think other browsers are fine, but I will use google chrome according to the author's article (if you do not have goole chrome, please install it). See: Python + Selenium goes through all the automatic operations of Chrome

Download the Chrome Driver from the Official Site. Downloads_-ChromeDriver-_WebDriver_for_Chrome.png Please note that ** download the same version ** as the google chrome installed on your PC. You can check the version of google chrome in Settings (icon with 3 "・" in the upper right corner of the browser arranged vertically) → "About Chrome". 設定_-_Chrome_について.png The zip file will be downloaded, so unzip it (double-click on Mac). Copy the unzipped files to ** a folder in your path **. "Passing through the path" may be difficult to understand unless you are accustomed to operating the terminal. Simply put, in order for a computer to run software, it needs to know ** where it is **. ** It is an operation to make the computer remember the path (= path) to that place **. To check the location of your PATH on your PC, execute the following command in the terminal.

echo $PATH

You should see something like "/ usr / local / bin" or "/ usr / bin" (":" is a place separator). You can copy it somewhere, but for now, let's go through the path to the Downloads folder in your new home. Execute the following command in the terminal.

echo 'export PATH=$PATH:~/Downloads' >> ~/.bash_profile
source ~/.bash_profile

Then type "echo $ PATH" again and the Downloads folder will be added to your PATH. Copy the unzipped Chrome Driver into this. After that, enter "chrome driver" in the terminal, and if the following message appears, it is in the PATH. tt_—chromedriver—_80×24.png

Installation of required python libraries

Install selenium and check if Chrome Driver works fine.

#Same content as the author's article
pip3 install selenium
python3
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get("https://www.python.org")
>>> driver.save_screenshot('screenshot.png')
True

True is displayed and selenium can be installed successfully. The Chrome Driver seems to work fine too.

Installation of konjac

Finally, install the translation konjac.

pip3 install Translation-Gummy

Perform translation

#Same content as the author's article
python3
>>> from gummy import TranslationGummy
>>> gummy = TranslationGummy(gateway="useless", translator="deepl")
>>> pdfpath = gummy.toPDF(url="https://www.nature.com/articles/ncb0800_500", path="sample.pdf", delete_html=True)
>>> print(pdfpath)

I tried to translate, but I got an error with "pdf path = gummy.toPDF (url =" https://www.nature.com/articles/ncb0800_500 ", path =" sample.pdf ", delete_html = True)" became. venvList_—Google_Chrome_Helper__Renderer__◂_Python—_128×65.png When I read the error message, it seems that there is no package called "punkt". "Please use the NLTK Downloader" is displayed, so try running it.

import nltk
nltk.download('punkt')
 [nltk_data] Error loading Punkt: <urlopen error [SSL:
 [nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed

However, I got an error again. A google search for the error message found a solution (NLTK download SSL: Certificate verify failed). Execute the following command on python3.

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

I will try again.

#Same content as the author's article
python3
>>> from gummy import TranslationGummy
>>> gummy = TranslationGummy(gateway="useless", translator="deepl")
>>> pdfpath = gummy.toPDF(url="https://www.nature.com/articles/ncb0800_500", path="sample.pdf", delete_html=True)
>>> print(pdfpath)
sample.pdf

This time it went well.

result

I was able to translate successfully! Wonderful! sample_pdf(1___10ページ).png

Supplement

I install konjak with pip, but it seems that it is not good to mix pip and conda environment (I do not understand exactly how bad it is. Reference: conda and pip: danger of mixing //onoz000.hatenablog.com/entry/2018/02/11/142347)). Therefore, I created a virtual environment for konjac using a package called ** venv ** and run it on the virtual environment. venv comes with python3 from the beginning.

Creating a virtual environment

mkdir directory#Creating a directory that uses the virtual environment
cd directory#Move to a directory that uses the virtual environment
python3 -m venv virtual environment name

Enter the virtual environment

source Virtual environment name/bin/activate

Recommended Posts

Until the person who touches the terminal for the first time automatically translates the dissertation into English with "Honjac Konjac".
What I got into Python for the first time
Kaggle for the first time (kaggle ①)
Kaguru for the first time
[For self-learning] Go2 for the first time
See python for the first time
Start Django for the first time
Impressions and memorandums when working with VS code for the first time