Last time, I implemented K-means method, but this time I implemented a hierarchical clustering method.
Hierarchical clustering is a method of forming clusters in order from the most similar combination.
The python code is below.
#Installation of required libraries
import numpy as np
import pandas as pd
#Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
%matplotlib inline
sns.set_style('whitegrid')
#Class for normalization
from sklearn.preprocessing import StandardScaler
#Import what you need for hierarchical clustering
from scipy.cluster import hierarchy
First import the required libraries. I will try to implement it using iris data this time as well.
#iris data
from sklearn.datasets import load_iris
#Data read
iris = load_iris()
iris.keys()
df_iris = iris.data
df_target = iris.target
target_names = iris.target_names
df_labels = target_names[df_target]
#Data normalization (mean 0,Standard deviation 1)
scaler = StandardScaler()
df_iris_std = scaler.fit_transform(df_iris)
The pre-processing ends above. This time, I would like to carry out hierarchical clustering using Ward's method. There are many other ways to define the distance between clusters, so you need to choose the right method for your data.
#Distance calculation
dist = hierarchy.distance.pdist(df_iris_std, metric='euclidean')
#Clustering
linkage = hierarchy.linkage(dist, method='ward')
#Dendrogram
fig, ax = plt.subplots(figsize=(5,13))
ax = hierarchy.dendrogram(Z=linkage,
orientation='right',
labels=dataset_labels)
fig.show()
Thank you for reading to the end. This time, I implemented the hierarchical clustering method.
If you have a request for correction, we would appreciate it if you could contact us.
Recommended Posts