[PYTHON] [EC2] Prevent garbled characters when capturing with selenium

[EC2] Prevent garbled characters when capturing with selenium

Garbled characters occurred when taking a screen capture of the URL specified by python selenium. ..

▼ chrome (Japanese) Top capture program

screenshot.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
options.add_argument('--window-size=1280,1024')

driver = webdriver.Chrome(options=options)

driver.get("https://www.google.co.jp/")
driver.save_screenshot('test.png')

driver.quit()

Garbled characters

image.png

With garbled characters lined up,

approach

python


sudo yum install ipa-gothic-fonts ipa-mincho-fonts ipa-pgothic-fonts ipa-pmincho-fonts

Install the font package for Japanese. Can I install it on the way? Is asked, so set it to "y".

image.png

The garbled characters have been successfully resolved.


## noto font In the article below, there was an article that garbled characters could be eliminated by inserting a noto font, but I didn't work ... https://qiita.com/onorioriori/items/4fa271daa3621e8f6fd9

python


#Move to tmp directory for zip download
cd /tmp/ 

#Install the full package of noto fonts
wget https://noto-website-2.storage.googleapis.com/pkgs/Noto-hinted.zip

#Defrost
unzip Noto-hinted.zip

#/usr/share/Create a directory under fonts
mkdir -p /usr/share/fonts/opentype/noto

#Move the corresponding file
cp *otf *ttf /usr/share/fonts/opentype/noto


#Update font cache (you don't have to)
fc-cache -f -v

I put the file in the specified directory, but the characters are still garbled.

image.png

Please let me know if anyone knows how to use noto. ..

Supplement: What is a noto font?

A language package jointly developed by Google, Adobe and Iwata. Compatible with all languages.

If the characters are garbled, a lot of □ like tofu will be displayed, so no tofu is abbreviated as not to font.

Official page


You can download the package for each font, or you can download it as a full package.

▼ Full package URL https://noto-website-2.storage.googleapis.com/pkgs/Noto-hinted.zip

▼ URL of noto Sans https://noto-website-2.storage.googleapis.com/pkgs/NotoSans-hinted.zip

Recommended Posts

[EC2] Prevent garbled characters when capturing with selenium
[EC2] Prevent garbled Japanese characters on selenium (noto compatible)
Half-width katakana characters are not garbled when using python + selenium execute_script
[Web development with Python] Measures against garbled characters when outputting html
[Note] Japanese characters are garbled with atom-runner
How to deal with SessionNotCreatedException when using Selenium
Fix garbled characters when handling Japanese in Requests
Scraping with selenium ~ 2 ~
Scraping with Selenium