Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)

Introduction

Coursera's Machine Learning course (Stanford, Dr. Andrew Ng) is a classic first step in learning machine learning. A series that implements the Matlab / Octave programming tasks in this course in Python. This time, we will do Principal Component Analysis (PCA) in the latter half of ex-7 unsupervised learning.

Library import

Import various libraries.

import numpy as np
import scipy.io as scio
import matplotlib.pyplot as plt
from sklearn import decomposition

Data reading / display

Load Matlab .mat format data with scipy.io.loadmat (). The data is 5000 32x32 pixel 256-gradation grayscale images. It comes in a 5000x1024 2D matrix. Let's display this as it is (only the first 100 images).

data = scio.loadmat('ex7faces.mat')
X = data['X'] #X is a 5000x1024 2D matrix

fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,100):
    ax = fig.add_subplot(10,10,i+1)
    ax.axis('off')
    ax.imshow(X[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()

Click here for output. ex7-3.png

Conducting principal component analysis

The data is reduced to 100 dimensions by applying principal component analysis to the original image data expressed in 32x32 pixels = 1024 dimensions. Principal component analysis is one shot in the sklearn.decomposition.PCA () class. The parameter n_components = allows you to specify how many principal components to take.

pca = decomposition.PCA(n_components=100)
pca.fit(X)

Visualization of principal components

The results of the principal component analysis are stored in pca.components_. It is a 100x1024 two-dimensional matrix. This principal component vector can be displayed as it is. Let's display only the first 36 principal components.

fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,36):
    ax = fig.add_subplot(6,6,i+1)
    ax.axis('off')
    ax.imshow(pca.components_[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()

Click here for the results. ex7-4.png

Dimension reduction and reconstruction

Principal component analysis reduces the image information originally represented by a 1024-dimensional vector to 100 dimensions. The dimensionally reduced dataset can be obtained with pca.transform (X) (5000x100 2D vector). Multiply this by the principal component vector to restore a 5000x1024 2D vector. The restored data is the original data compressed with 100 main components and restored so that it can be displayed. Let's display the first 100 images of the reconstructed result.

Xreduce = pca.transform(X) #Dimension reduction. The result is a 5000x100 matrix
Xrecon = np.dot(Xreduce, pca.components_) #Rebuilding. The result is a 5000x1024 matrix

fig = plt.figure()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
for i in range(0,100):
    ax = fig.add_subplot(10,10,i+1)
    ax.axis('off')
    ax.imshow(Xrecon[i].reshape(32,32).T, cmap = plt.get_cmap('gray'))
plt.show()

Click here for the results. Compared to the original image above, you can see that the rough features have been restored, although the details have been lost.

ex7-5.png

in conclusion

This time too, the code is simple.

Recommended Posts

Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Coursera Machine Learning Challenges in Python: ex5 (Adjustment of Regularization Parameters)
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)
Coursera Machine Learning Challenges in Python: ex7-1 (Image compression with K-means clustering)
Python: Unsupervised Learning: Principal Component Analysis
Coursera Machine Learning Challenges in Python: ex3 (Handwritten Number Recognition with Logistic Regression)
<Course> Machine learning Chapter 4: Principal component analysis
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
Principal component analysis (PCA) and independent component analysis (ICA) in python
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
Python: Preprocessing in Machine Learning: Overview
Visualize the correlation matrix by principal component analysis in Python
Machine learning with python (2) Simple regression analysis
[python] Frequently used techniques in machine learning
Python: Preprocessing in machine learning: Data acquisition
Principal component analysis with Power BI + Python
[Python] Saving learning results (models) in machine learning
Python: Preprocessing in machine learning: Data conversion
Preprocessing in machine learning 1 Data analysis process
Principal component analysis
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
Get a glimpse of machine learning in Python
Robot grip position (Python PCA principal component analysis)
Association analysis in Python
Principal component analysis (Principal component analysis: PCA)
Regression analysis in Python
Build an interactive environment for machine learning in Python
Tool MALSS (application) that supports machine learning in Python
Tool MALSS (basic) that supports machine learning in Python
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Challenge principal component analysis of text data with Python
Attempt to include machine learning model in python package
Cross-entropy to review in Coursera Machine Learning week 2 assignments
Principal component analysis using python from nim with nimpy
MALSS, a tool that supports machine learning in Python
The result of Java engineers learning machine learning in Python www
Machine learning with Python! Preparation
Axisymmetric stress analysis in Python
Python data analysis learning notes
Used in machine learning EDA
Simple regression analysis in Python
Beginning with Python machine learning
How about Anaconda for building a machine learning environment in Python?
Perform morphological analysis in the machine learning environment launched by GCE
EEG analysis in Python: Python MNE tutorial
Implement stacking learning in Python [Kaggle]
First simple regression analysis in Python
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Automate routine tasks in machine learning
Machine learning algorithm (simple regression analysis)
Face recognition using principal component analysis
Principal component analysis with Spark ML
Classification and regression in machine learning
<For beginners> python library <For machine learning>
Machine learning in Delemas (data acquisition)
Implemented Perceptron learning rules in Python
Preprocessing in machine learning 2 Data acquisition