[PYTHON] Eliminate garbled Japanese characters in matplotlib graphs in Cloud Pak for Data Notebook

When drawing a graph with matplotlib or seaborn, the Japanese characters in the graph may be garbled and become tofu (□□□) as shown below. image.png

This is a matplotlib / seaborn specific issue due to the environment not having Japanese fonts or being properly configured. The solution is also explained in Other articles, but it is a method to solve this on Cloud Pak for Data (hereinafter CP4D).

Environment: CP4D v2.5, v3.0LA

In CP4D, the Python environment to start with Notebook is prepared in advance, and it starts in the initial state every time the runtime is started, so once you set it to the Python environment, it will not be OK in the future. As a temporary measure, we will implement a solution (font download and setting change) in Notebook.

Japanese character garbled countermeasures for matplotlib (CP4D version)

Run the following code at the beginning of your notebook. The font is an example using the IPA font mentioned in the previous article.

# download and install a Japanese font
!cd /tmp; curl -O https://ipafont.ipa.go.jp/IPAexfont/ipaexg00401.zip
!unzip -jo /tmp/ipaexg00401.zip -d ~/.fonts
# register the font
!fc-cache -fv; fc-list
# reset the matplotlib cache
!rm -rf ~/.cache/matplotlib

(Optional) After running the above, verify that IPAex Gothic has been added to the fonts that matplotlib can recognize in the code below. Looking at this, we can see that there was originally only DejaVu Sans in CP4D's default Python environment.

import matplotlib.font_manager;
[matplotlib.font_manager.FontProperties(fname=fname).get_name() for fname in matplotlib.font_manager.get_fontconfig_fonts()]
# -output-
#['DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'DejaVu Sans',
# 'IPAexGothic']

Before drawing the graph, it is OK if you specify font.family in rcParams.

from matplotlib import pyplot as plt
from matplotlib import rcParams
plt.rcParams['font.family'] = 'IPAexGothic'

Execution example

# download and install a Japanese font
!cd /tmp; curl -O https://ipafont.ipa.go.jp/IPAexfont/ipaexg00401.zip
!unzip -jo /tmp/ipaexg00401.zip -d ~/.fonts
# register the font
!fc-cache -fv; fc-list
# reset the matplotlib cache
!rm -rf ~/.cache/matplotlib
# -output-
#abridgement

Prepare sample data


import pandas as pd
df = pd.DataFrame({
    'AIUEO' : [1,2,3,4,5],
    'Kakikukeko' : [0.1,0.2,0.3,0.4,0.5],
    'Sashisuseso' : [10,20,30,40,50],
    'Chinese characters' : [100.1,100.2,100.3,100.4,100.5]
})
df
# -output-
#Aiueo Kakikukeko Sashisuseso Kanji
# 0	1	0.1	10	100.1
# 1	2	0.2	20	100.2
# 2	3	0.3	30	100.3
# 3	4	0.4	40	100.4
# 4	5	0.5	50	100.5

Draw graph


%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import rcParams
import seaborn as sns

# Specify font
plt.rcParams['font.family'] = 'IPAexGothic'

sns.pairplot(df)

result: image.png


(Addition) From CP4D v3.0.1, it seems that you can create an environment in which fonts are already installed by creating a custom image of the Python environment. Hopefully it can be used as a permanent measure. I will challenge if I have the opportunity.

Recommended Posts

Eliminate garbled Japanese characters in matplotlib graphs in Cloud Pak for Data Notebook
Eliminate garbled Japanese characters in JSON data acquired by API.
Eliminate garbled Japanese characters in Python library matplotlib and NetworkX
Resolve garbled Japanese characters in matplotlib of Jupyter Notebook on Docker
How to eliminate garbled characters in matplotlib output image
How to change python version of Notebook in Watson Studio (or Cloud Pak for Data)
Deploy functions with Cloud Pak for Data
Cloud Pak for Data object operation example in Python (WML client, project_lib)
Fix garbled characters when handling Japanese in Requests
Save pandas data in Excel format to data assets with Cloud Pak for Data (Watson Studio)
[Visual Studio Code] [Python] [Windows] Support for garbled Japanese characters in Python in VS Code task / debug output
Seaborn, matplotlib garbled characters resolved in Windows10, Anaconda virtual environment
About the garbled Japanese part of pandas-profiling in Jupyter notebook
Embed matplotlib graphs in Tkinter