[PYTHON] [Translation] scikit-learn 0.18 User Guide 4.4. Unsupervised dimensionality reduction

Google translated http://scikit-learn.org/0.18/modules/unsupervised_reduction.html [scikit-learn 0.18 User Guide 4. Dataset Conversion](http://qiita.com/nazoking@github/items/267f2371757516f8c168#4-%E3%83%87%E3%83%BC%E3%82%BF From% E3% 82% BB% E3% 83% 83% E3% 83% 88% E5% A4% 89% E6% 8F% 9B)


4.4. Unsupervised dimensionality reduction

If you have a large number of feature bases, it is useful to reduce it in unsupervised steps before supervised steps. [Unsupervised learning method](http://qiita.com/nazoking@github/items/267f2371757516f8c168#2-%E6%95%99%E5%B8%AB%E3%81%AA%E3%81%97% Many of E5% AD% A6% E7% BF% 92) implement transform methods that can be used to reduce dimensions. The following describes two specific examples of this frequently used pattern.

4.4.1. PCA: Principal component analysis

decomposition.PCA is a feature that captures the variance of the original feature well. Look for a combination of. See Decomposing component signals (matrix factorization problem) (http://scikit-learn.org/0.18/modules/decomposition.html#decompositions).

4.4.2. Random projection

Module: random_projection provides several tools for data reduction with random projection. See the relevant section of the documentation: Random Projection (http://qiita.com/nazoking@github/items/16f65bbcfda517a74df2).

4.4.3. Feature accumulation

cluster.FeatureAgglomeration is [Hierarchical Clustering](http://scikit- Apply learn.org/0.18/modules/clustering.html#hierarchical-clustering) to group features that behave similarly.

** Feature scaling **

If the feature scaling or statistical properties are very different, then cluster.FeatureAgglomeration Note that you may not be able to get links between related features. In these environments, you can use preprocessing.StandardScaler.


[scikit-learn 0.18 User Guide 4. Dataset Conversion](http://qiita.com/nazoking@github/items/267f2371757516f8c168#4-%E3%83%87%E3%83%BC%E3%82%BF From% E3% 82% BB% E3% 83% 83% E3% 83% 88% E5% A4% 89% E6% 8F% 9B) © 2010 --2016, scikit-learn developers (BSD license).

Recommended Posts

[Translation] scikit-learn 0.18 User Guide 4.4. Unsupervised dimensionality reduction
[Translation] scikit-learn 0.18 User Guide 4.5. Random projection
[Translation] scikit-learn 0.18 User Guide 1.11. Ensemble method
[Translation] scikit-learn 0.18 User Guide 1.15. Isotonic regression
[Translation] scikit-learn 0.18 User Guide 4.2 Feature extraction
[Translation] scikit-learn 0.18 User Guide 1.16. Probability calibration
[Translation] scikit-learn 0.18 User Guide 1.13 Feature selection
[Translation] scikit-learn 0.18 User Guide 3.4. Model persistence
[Translation] scikit-learn 0.18 User Guide 2.8. Density estimation
[Translation] scikit-learn 0.18 User Guide 4.3. Data preprocessing
[Translation] scikit-learn 0.18 User Guide Table of Contents
[Translation] scikit-learn 0.18 User Guide 1.4. Support Vector Machine
[Translation] scikit-learn 0.18 User Guide 1.12. Multi-class algorithm and multi-label algorithm
[Translation] scikit-learn 0.18 User Guide 3.2. Tuning the hyperparameters of the estimator
[Translation] scikit-learn 0.18 User Guide 4.8. Convert the prediction target (y)
[Translation] scikit-learn 0.18 User Guide 2.7. Detection of novelty and outliers
[Translation] scikit-learn 0.18 User Guide 3.1. Cross-validation: Evaluate the performance of the estimator
[Translation] scikit-learn 0.18 User Guide 3.3. Model evaluation: Quantify the quality of prediction
[Translation] scikit-learn 0.18 User Guide 4.1. Pipeline and Feature Union: Combination of estimators
[Translation] scikit-learn 0.18 User Guide 3.5. Verification curve: Plot the score to evaluate the model
[Translation] scikit-learn 0.18 User Guide 2.5. Decompose the signal in the component (matrix factorization problem)
Pandas User Guide "Multi-Index / Advanced Index" (Official document Japanese translation)
Pandas User Guide "Manipulating Missing Data" (Official Document Japanese Translation)