[PYTHON] I want to output a beautifully customized heat map of the correlation matrix. matplotlib edition

Overview

With Python + pandas + matplotlib, we will create a nicely formatted ** heatmap ** from the ** correlation matrix ** (a matrix of correlation coefficients between each variable).

Here, as an example, I would like to create a heat map for the following ** 5 subject grade ** correlation matrix.

fig2.png

Execution environment

We have confirmed the execution and operation with Google Colab. (Python 3.6.9). It's almost the same as Jupyter Notebook.

!pip list 
matplotlib               3.1.2   
numpy                    1.17.4 
pandas                   0.25.3    

Preparing to use Japanese with matplotlib

Make Japanese available in the output diagram of matplotlib.

!pip install japanize-matplotlib
import japanize_matplotlib

With the above, japanize-matplotlib-1.0.5 will be installed and imported, and even if you use Japanese for labels etc., the characters will not be garbled (tofu).

Find the correlation matrix and create a heat map for the time being

The correlation matrix can be easily calculated using the pandas function.

import pandas as pd

#Dummy data
National language= [76, 62, 71, 85, 96, 71, 68, 52, 85, 91]
society= [71, 85, 64, 55, 79, 72, 73, 52, 84, 84]
Math= [50, 78, 48, 64, 66, 62, 58, 50, 50, 60]
Science= [37, 90, 45, 56, 59, 56, 84, 86, 51, 61]
English= [59, 97, 71, 85, 58, 82, 70, 61, 79, 70]
df = pd.DataFrame( {'National language':National language, 'society':society, 'Math':Math, 'Science':Science, 'English':English} )
#Calculate the correlation coefficient
df2 = df.corr() 
display(df2)

table1.png

Each element of the matrix takes a value in the range $ -1.0 $ to $ 1.0 $. The closer this value is to $ 1.0 $, the more ** positive the correlation **, and the closer it is to $ -1.0 $, the ** negative correlation **. In the range of $ -0.2 $ to $ 0.2 $, it is judged as ** uncorrelated (uncorrelated) **.

Since the diagonal elements are the correlation coefficient between the same items, it will be $ 1.0 $ (= there is a perfect positive correlation).

Even if the correlation coefficient is arranged as a numerical value as shown above, it is difficult to grasp the whole, so let's visualize it using a heat map.

First of all, let's create a heat map with the minimum necessary code without adjusting the appearance.

%reset -f
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors

#Dummy data
National language= [76, 62, 71, 85, 96, 71, 68, 52, 85, 91]
society= [71, 85, 64, 55, 79, 72, 73, 52, 84, 84]
Math= [50, 78, 48, 64, 66, 62, 58, 50, 50, 60]
Science= [37, 90, 45, 56, 59, 56, 84, 86, 51, 61]
English= [59, 97, 71, 85, 58, 82, 70, 61, 79, 70]
df = pd.DataFrame( {'National language':National language, 'society':society, 'Math':Math, 'Science':Science, 'English':English} )

#Calculate the correlation coefficient
df2 = df.corr() 
display(df2)

#Output the matrix of correlation coefficients as a heat map
plt.figure(dpi=120)
plt.imshow(df2,interpolation='nearest',vmin=-1.0,vmax=1.0)
plt.colorbar()

#Setting to output item names (national language, society, mathematics, science, English) on the axis
n = len(df2.columns) #Number of items
plt.gca().set_xticks(range(n))
plt.gca().set_xticklabels(df2.columns)
plt.gca().set_yticks(range(n))
plt.gca().set_yticklabels(df2.columns)

Execution result

You can get the following output: Based on the color bar on the right side, there is a ** negative correlation ** where the dark purple and blue colors are, and a ** positive correlation where the bright yellow and green colors are. I will read if there is **.

fig1.png

To be honest, you can't create an easy-to-understand heatmap with the default settings.

Beautifully output

We will customize it to obtain a beautiful and intuitive heat map. The main points are as follows.

――The diagonal components are white and shaded. --Customize the color map so that it is white in the uncorrelated range. --Insert a grid (draw a white line between squares). --Print the correlation coefficient value on the square. --Border it so that it looks beautiful even if it overlaps with the background color.

When coded, it looks like this:

%reset -f
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import matplotlib.ticker as ticker
import matplotlib.colors

#Dummy data
National language= [76, 62, 71, 85, 96, 71, 68, 52, 85, 91]
society= [71, 85, 64, 55, 79, 72, 73, 52, 84, 84]
Math= [50, 78, 48, 64, 66, 62, 58, 50, 50, 60]
Science= [37, 90, 45, 56, 59, 56, 84, 86, 51, 61]
English= [59, 97, 71, 85, 58, 82, 70, 61, 79, 70]
df = pd.DataFrame( {'National language':National language, 'society':society, 'Math':Math, 'Science':Science, 'English':English} )
#Calculate the correlation coefficient
df2 = df.corr()
for i in df2.index.values :
  df2.at[i,i] = 0.0

#Output the matrix of correlation coefficients as a heat map
plt.figure(dpi=120)

#Custom color map
cl = list()
cl.append( ( 0.00, matplotlib.colors.hsv_to_rgb((0.6, 1.  ,1))) )
cl.append( ( 0.30, matplotlib.colors.hsv_to_rgb((0.6, 0.1 ,1))) )
cl.append( ( 0.50, matplotlib.colors.hsv_to_rgb((0.3, 0.  ,1))) )
cl.append( ( 0.70, matplotlib.colors.hsv_to_rgb((0.0, 0.1 ,1))) )
cl.append( ( 1.00, matplotlib.colors.hsv_to_rgb((0.0, 1.  ,1))) )
ccm = matplotlib.colors.LinearSegmentedColormap.from_list('custom_cmap', cl)

plt.imshow(df2,interpolation='nearest',vmin=-1.0,vmax=1.0,cmap=ccm)

#Setting of color bar to be displayed on the left side
fmt = lambda p, pos=None : f'${p:+.1f}$' if p!=0 else '  $0.0$'
cb = plt.colorbar(format=ticker.FuncFormatter(fmt))
cb.set_label('Correlation coefficient', fontsize=11)

#Settings related to output of items (national language, society, mathematics, science, English)
n = len(df2.columns) #Number of items
plt.gca().set_xticks(range(n))
plt.gca().set_xticklabels(df.columns)
plt.gca().set_yticks(range(n))
plt.gca().set_yticklabels(df.columns)

plt.tick_params(axis='x', which='both', direction=None, 
                top=True, bottom=False, labeltop=True, labelbottom=False)
plt.tick_params(axis='both', which='both', top=False, left=False )

#Grid settings
plt.gca().set_xticks(np.arange(-0.5, n-1), minor=True);
plt.gca().set_yticks(np.arange(-0.5, n-1), minor=True);
plt.grid( which='minor', color='white', linewidth=1)

#Diagonal line
plt.plot([-0.5,n-0.5],[-0.5,n-0.5],color='black',linewidth=0.75)

#Display correlation coefficient (characters have borders)
tp = dict(horizontalalignment='center',verticalalignment='center')
ep = [path_effects.Stroke(linewidth=3, foreground='white'),path_effects.Normal()]
for y,i in enumerate(df2.index.values) :
  for x,c in enumerate(df2.columns.values) :
    if x != y :
      t = plt.text(x, y, f'{df2.at[i,c]:.2f}',**tp)
      t.set_path_effects(ep) 

Execution result

fig2.png

Recommended Posts

I want to output a beautifully customized heat map of the correlation matrix. matplotlib edition
Keras I want to get the output of any layer !!
[Python] I want to make a 3D scatter plot of the epicenter with Cartopy + Matplotlib!
I want to output the beginning of the next month with Python
I want to create a histogram and overlay the normal distribution curve on it. matplotlib edition
I want to sort a list in the order of other lists
Python: I want to measure the processing time of a function neatly
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
I want to make matplotlib a dark theme
I want to customize the appearance of zabbix
I want to set a life cycle in the task definition of ECS
I want to add silence to the beginning of a wav file for 1 second
I want to see a list of WebDAV files in the Requests module
I tried to display the infection condition of coronavirus on the heat map of seaborn
I want to grep the execution result of strace
I want to fully understand the basics of Bokeh
I want to install a package of Php Redis
[Python] How to create Correlation Matrix and Heat Map
I want to manually create a legend with matplotlib
I want to increase the security of ssh connections
The story of Linux that I want to teach myself half a year ago
I want to start a lot of processes from python
I want to use only the normalization process of SudachiPy
NikuGan ~ I want to see a lot of delicious meat! !!
I want to get the operation information of yahoo route
I made a function to check the model of DCGAN
I want to find the intersection of a Bezier curve and a straight line (Bezier Clipping method)
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
I want to map the EDINET code and securities number
I want to know the legend of the IT technology world
I want to create a Dockerfile for the time being.
When generating a large number of graphs with matplotlib, I do not want to display the graph on the screen (jupyter environment)
I want to create a graph with wavy lines omitted in the middle with matplotlib (I want to manipulate the impression)
[Twitter] I want to make the downloaded past tweets (of my account) into a beautiful CSV
I want to clear up the question of the "__init__" method and the "self" argument of a Python class.
I want to extract the tag information (title and artist) of a music file (flac, wav).
I want to get the name of the function / method being executed
I want to record the execution time and keep a log.
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I want to create a system to prevent forgetting to tighten the key 1
How to output the output result of the Linux man command to a file
I want to check the position of my face with OpenCV!
I want to know the population of each country in the world.
I tried to sort out the objects from the image of the steak set meal --③ Similar image Heat map detection
I want to use complicated four arithmetic operations in the IF statement of the Django template! → Use a custom template
I want to change the color by clicking the scatter point in matplotlib
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
I want to send a signal only from the sub thread to the main thread
(Matplotlib) I want to draw a graph with a size specified in pixels
I don't want to admit it ... The dynamical representation of Neural Networks
(Python Selenium) I want to check the settings of the download destination of WebDriver
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I want to express my feelings with the lyrics of Mr. Children
I want to color a part of an Excel string in Python
I want to identify the alert email. --Is that x a wildcard? ---
I want to analyze the emotions of people who want to meet and tremble
I want to use the Qore SDK to predict the success of NBA players
I want to leave an arbitrary command in the command history of Shell