[PYTHON] About the garbled Japanese part of pandas-profiling in Jupyter notebook

Introduction

I think the first thing to do when analyzing data is to understand what characteristics the data has. In such a case, using pandas-profiling is very convenient because it will do EDA all at once. However, when I tried it with Jupyter notebook, the Japanese columns of the data became garbled (tofu) like □□□, so I would like to summarize the solution.

environment

Cause

The cause of garbled characters in pandas-profiling is that ** matplotlib ** and ** seaborn ** are not compatible with Japanese localization. If it can be translated into Japanese, pandas-profiling using matplotlib and seaborn will also be supported in Japanese.

In this article, I will explain the procedure for Japaneseizing ** matplotlib ** and ** seaborn **.

Japanese localization of Matplotlib and seaborn

In this article, I will explain how to support Japanese in the Jupyter notebook environment using Docker. There may be a more efficient method, so I would appreciate it if you could comment on it.

1. Download Japanese fonts

Download ** ipaexg00401.zip (4.0MB) ** from this site and unzip it. Move ** ipaexg.ttf ** in the ipaexg00401 folder to the directory where the Dockerfile is located.

2. Copy the files on the container to the host for seaborn Japanese support

The work to be done here is to download rcmod.py necessary for Japaneseizing seaborn locally and rewrite the contents, and every time docker-compose up, rcmod.py on the container is rewritten on the host. Set to overwrite with .py. By taking such a flow, you do not have to rewrite rcmod.py every time docker-compose up.

(I really want to rewrite on the container with Dockerfile, but I didn't understand)

Do ** docker-compose up ** when Japanese is not supported. Open another terminal and check the container ID.

#Check the container ID
$ docker ps

Then save rcmod.py on the container to the host (locally).

$ docker cp [Container ID]:opt/conda/lib/python3.8/site-packages/seaborn/rcmod.py [Destination(C:\Users\....Such)]

Copy the last saved rcmod.py to the directory where the Dockerfile is.

3. Rewrite rcmod.py

Open rcmod.py and change the following:

Change the font part of def set (context = "notebook", ...) on lines 86-87 to ** font = "IPAexGothic" **.

def set_theme(context="notebook", style="darkgrid", palette="deep",
              font="IPAexGothic", font_scale=1, color_codes=True, rc=None):

Then change ** "font.family": ["sans-serif"] ** on line 205 to:

"font.family": ["IPAexGothic"]

This completes the rewriting of seaborn for Japanese support.

4. Add the following to your Dockerfile

#Japanese localization of matplotlib and scipy
#Copy Japanese font
COPY ipaexg.ttf /opt/conda/lib/python3.8/site-packages/matplotlib/mpl-data/fonts/ttf/ipaexg.ttf
#Rewritten rcmod.rcmod on container with py.Overwrite py
COPY settings/localize_ja/rcmod.py /opt/conda/lib/python3.8/site-packages/seaborn/rcmod.py
#Font at the end of the matplotlib config file.family :Add IPAex Gothic
RUN echo "font.family : IPAexGothic" >>  /opt/conda/lib/python3.8/site-packages/matplotlib/mpl-data/matplotlibrc
#Clear cache
RUN rm -r ./.cache

Now you can support matplotlib and seaborn in Japanese. You can check if the characters are garbled with the following code.

#Check if matplotlib is compatible with Japan
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.xlabel('Localizing into Japanese')
plt.ylabel('of matplotlib')
plt.show()
#Check if seaborn can speak Japanese
import seaborn as sns
sns.set(style="whitegrid")

# Load the example Titanic dataset
titanic = sns.load_dataset("titanic")

# Draw a nested barplot to show survival for class and sex
g = sns.catplot(x="class", y="survived", hue="sex", data=titanic,
                height=6, kind="bar", palette="muted")
g.despine(left=True)
g.set_ylabels("Japaneseization of seaborn")

Since Japanese is used for the label, it is successful if the label is not garbled (tofu).

If you can confirm that matplotlib and seaborn support Japanese, pandas-profiling should also support Japanese.

At the end

I feel that pandas-profiling will become a standard for the time being before doing EDA.

Recommended Posts

About the garbled Japanese part of pandas-profiling in Jupyter notebook
Resolve garbled Japanese characters in matplotlib of Jupyter Notebook on Docker
Implement part of the process in C ++
Browser specification of Jupyter Notebook in Windows environment
Fill the browser with the width of Jupyter Notebook
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
To output a value even in the middle of a cell with Jupyter Notebook
A memorandum of how to execute the! Sudo magic command in Jupyter Notebook
How to make the font width of jupyter notebook put in pyenv equal width
The kernel of jupyter notebook can no longer connect
About testing in the implementation of machine learning models
About the inefficiency of data transfer in luigi on-memory
About the uncluttered arrangement in the import order of flake8
A reminder about the implementation of recommendations in Python
Think about the analysis environment (Part 1: Overview) * As of January 2017
Display HTML in Jupyter notebook
Multiprocessing error in Jupyter Notebook
I wrote the basic operation of Seaborn in Jupyter Lab
About Japanese path of pyminizip
About the ease of Python
About Japanese support of cometchat
Reflect the virtual environment created with Miniconda in Jupyter notebook
[Super Basics] About jupyter Notebook
Sort the string array in order of length & Japanese syllabary
How to see the contents of the Jupyter notebook ipynb file
Make the function of drawing Japanese fonts in OpenCV general
Change the theme of Jupyter
I wrote the basic operation of Numpy in Jupyter Lab.
About the components of Luigi
View dynamic graphs in Jupyter notebook. (Inline display of D3.js)
About the features of Python
Wrap (part of) the AtCoder Library in Cython for use in Python
Git management of Jupyter notebook (ipynb) differences in easy-to-read with JupyterLab
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
Jupyter Notebook 6.0.2 cannot be installed in the Python 2.7 environment created in Anaconda
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
Japanese translation of the e2fsprogs manual
The story of participating in AtCoder
Generate Jupyter notebook ".ipynb" in Python
About the return value of pthread_mutex_init ()
Simply view the Jupyter notebook file
About the return value of the histogram.
About the basic type of Go
The story of the "hole" in the file
About the upper limit of threads-max
View graphs inline in Jupyter Notebook
Japanese translation of the man-db manual
About the average option in sklearn.metrics.f1_score
The meaning of ".object" in Django
About the behavior of yield_per of SqlAlchemy
About the size of matplotlib points
Japanese translation of the util-linux manual
About the basics list of Python basics
Japanese translation of the iproute2 manual
About the order of learning programming languages (from beginner to intermediate) Part 2
Eliminate garbled Japanese characters in matplotlib graphs in Cloud Pak for Data Notebook
[Understanding in 3 minutes] The beginning of Linux
Check the behavior of destructor in Python
Memory leak in Python Jupyter Lab (Notebook)?
The story of an error in PyOCR