Google translated http://scikit-learn.org/0.18/modules/unsupervised_reduction.html [scikit-learn 0.18 User Guide 4. Dataset Conversion](http://qiita.com/nazoking@github/items/267f2371757516f8c168#4-%E3%83%87%E3%83%BC%E3%82%BF From% E3% 82% BB% E3% 83% 83% E3% 83% 88% E5% A4% 89% E6% 8F% 9B)

4.4. Unsupervised dimensionality reduction

If you have a large number of feature bases, it is useful to reduce it in unsupervised steps before supervised steps. [Unsupervised learning method](http://qiita.com/nazoking@github/items/267f2371757516f8c168#2-%E6%95%99%E5%B8%AB%E3%81%AA%E3%81%97% Many of E5% AD% A6% E7% BF% 92) implement transform methods that can be used to reduce dimensions. The following describes two specific examples of this frequently used pattern.

Pipeline processing
Unmanaged data reductions and managed estimates can be chained in one step. See Pipeline: Chain Estimates (http://qiita.com/nazoking@github/items/fdfd207b3127d6d026e0).

4.4.1. PCA: Principal component analysis

decomposition.PCA is a feature that captures the variance of the original feature well. Look for a combination of. See Decomposing component signals (matrix factorization problem) (http://scikit-learn.org/0.18/modules/decomposition.html#decompositions).

Example
Example of recognition using eigenface and SVM

4.4.2. Random projection

Module: random_projection provides several tools for data reduction with random projection. See the relevant section of the documentation: Random Projection (http://qiita.com/nazoking@github/items/16f65bbcfda517a74df2).

Example
[Johnson-Lindenstrauss is bound to random projection embedding](http://scikit-learn.org/0.18/auto_examples/plot_johnson_lindenstrauss_bound.html#sphx-glr-auto-examples-plot-johnson-lindenstrauss-bound -py)

4.4.3. Feature accumulation

cluster.FeatureAgglomeration is [Hierarchical Clustering](http://scikit- Apply learn.org/0.18/modules/clustering.html#hierarchical-clustering) to group features that behave similarly.

Example
[Feature integration vs. univariate selection](http://scikit-learn.org/0.18/auto_examples/cluster/plot_feature_agglomeration_vs_univariate_selection.html#sphx-glr-auto-examples-cluster-plot-feature-agglomeration-vs-univariate- selection-py)
Feature Accumulation

** Feature scaling **

If the feature scaling or statistical properties are very different, then cluster.FeatureAgglomeration Note that you may not be able to get links between related features. In these environments, you can use preprocessing.StandardScaler.

[scikit-learn 0.18 User Guide 4. Dataset Conversion](http://qiita.com/nazoking@github/items/267f2371757516f8c168#4-%E3%83%87%E3%83%BC%E3%82%BF From% E3% 82% BB% E3% 83% 83% E3% 83% 88% E5% A4% 89% E6% 8F% 9B) © 2010 --2016, scikit-learn developers (BSD license).

[PYTHON] [Translation] scikit-learn 0.18 User Guide 4.4. Unsupervised dimensionality reduction

4.4. Unsupervised dimensionality reduction

4.4.1. PCA: Principal component analysis

4.4.2. Random projection

4.4.3. Feature accumulation