Coursera's Machine Learning course (Stanford, Dr. Andrew Ng) is a classic first step in learning machine learning. A series that implements the Matlab / Octave programming tasks in this course in Python. This time, we will do Principal Component Analysis (PCA) in the latter half of ex-7 unsupervised learning.
Import various libraries.
import numpy as np
import scipy.io as scio
import matplotlib.pyplot as plt
from sklearn import decomposition
Load Matlab .mat format data with scipy.io.loadmat ()
. The data is 5000 32x32 pixel 256-gradation grayscale images. It comes in a 5000x1024 2D matrix.
Let's display this as it is (only the first 100 images).
data = scio.loadmat('ex7faces.mat')
X = data['X'] #X is a 5000x1024 2D matrix
fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,100):
ax = fig.add_subplot(10,10,i+1)
ax.axis('off')
ax.imshow(X[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()
Click here for output.
The data is reduced to 100 dimensions by applying principal component analysis to the original image data expressed in 32x32 pixels = 1024 dimensions. Principal component analysis is one shot in the sklearn.decomposition.PCA ()
class. The parameter n_components =
allows you to specify how many principal components to take.
pca = decomposition.PCA(n_components=100)
pca.fit(X)
The results of the principal component analysis are stored in pca.components_
. It is a 100x1024 two-dimensional matrix. This principal component vector can be displayed as it is. Let's display only the first 36 principal components.
fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,36):
ax = fig.add_subplot(6,6,i+1)
ax.axis('off')
ax.imshow(pca.components_[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()
Click here for the results.
Principal component analysis reduces the image information originally represented by a 1024-dimensional vector to 100 dimensions. The dimensionally reduced dataset can be obtained with pca.transform (X)
(5000x100 2D vector).
Multiply this by the principal component vector to restore a 5000x1024 2D vector. The restored data is the original data compressed with 100 main components and restored so that it can be displayed. Let's display the first 100 images of the reconstructed result.
Xreduce = pca.transform(X) #Dimension reduction. The result is a 5000x100 matrix
Xrecon = np.dot(Xreduce, pca.components_) #Rebuilding. The result is a 5000x1024 matrix
fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,100):
ax = fig.add_subplot(10,10,i+1)
ax.axis('off')
ax.imshow(Xrecon[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()
Click here for the results. Compared to the original image above, you can see that the rough features have been restored, although the details have been lost.
This time too, the code is simple.