[PYTHON] I briefly summarized what you should keep in mind when learning with or without supervised learning

As a memorandum, I will summarize the outline, classes, examples, keywords to be used, and the sites that were helpful for learning about "supervised learning" and "unsupervised learning".

"Supervised learning"

Mecha Zackri: A prediction model is created by giving training that represents the characteristics and the corresponding answer data. There are classification problems and regression problems in prediction.

Each method

① Linear regression

Find the parameter that has the smallest loss function (error function) value among all straight lines.

--Class to use: sklearn.linear_model.LinearRegression --Example: Relationship between the number of visitors and sales, etc. --Keywords: simple regression, multiple regression, polynomial regression, non-linear regression --Reference site: [Linear regression with scikit-learn (single regression analysis / multiple regression analysis)](https://pythondatascience.plavox.info/scikit-learn/%E7%B7%9A%E5%BD%A2%E5 % 9B% 9E% E5% B8% B0)

② Logistic regression

It is a binary classification algorithm and is applied to classification problems.

--Class to use: sklearn.linear_model.LogisticRegression --Example: Relationship between sales visits / satisfaction and sales, etc. --Keywords: sigmoid function, cross entropy error function --Reference site: Classification of iris by logistic regression of scikit-learn

③ SVM (linear)

An algorithm that learns the decision boundary (straight line) away from the data and can be used for both classification and regression.

--Class to use: sklearn.svm.SVC --Case: Text classification, number recognition, etc. --Keywords: hard margin, soft margin --Reference site: What is a support vector machine (SVM)? ~ From basic to Python implementation ~

④ SVM (kernel method)

After mapping the data in the real space to a space that can be separated by a hyperplane by the kernel function, the data set is separated.

--Class to use: sklearn.svm.SVC --Case example: Product identification from color information, etc. --Keywords: Kernel functions (sigmoid kernel, polynomial kernel, RBF [radial basis function] kernel) --Reference site: [Python] Implementing support vector machines using various kernel functions [iris dataset]

⑤ Naive Bayes

Under the assumption that each feature is independent, we calculate the probability that the data is a label.

--Class to use: sklearn.naive_bayes.MultinomialNB (Other GaussianNB, GaussianNB, etc.) --Case: Judgment of junk mail, etc. --Keyword: Smoothing --Reference site: Naive Bayes classifier by scikit-learn

⑥ Random forest

Collect output from multiple decision trees with diversity and produce classification results by majority vote.

--Class to use: sklearn.ensemble.RandomForestClassifier --Case: Classification by behavior history and attributes --Keywords: Gini coefficient, bootstrap method --Reference site: [Introduction] Decision tree analysis for beginners by beginners

⑦ Neural network

By sandwiching an intermediate layer between the input and the output, a complex decision boundary is learned.

--Class to use: sklearn.neural_network.MLPClassifier --Case: Image recognition, voice recognition --Keywords: simple perceptron, activation function, early stopping --Reference site: Let's make a neural network by yourself

⑧ k-nearest neighbor method

Judgment is made by majority voting of k classifications in the vicinity of the input data.

--Class to use: sklearn.neighbors.KNeighborsClassifier --Reference site: Machine learning ~ K-nearest neighbor method ~

Evaluation method

-** a. For classification problems ** --a-1. Confusion matrix
Class to use: sklearn.metrics.confusion_matrix --a-2. Correct answer rate
Class to use: sklearn.metrics.accuracy_score --a-3. Compliance rate
Class to use: sklearn.metrics.precision_score --a-4. Recall rate
Class to use: sklearn.metrics.recall_score --a-5. F value
Class to use: sklearn.metrics.f1_score

a-6. ROC-AUC
Class to use: sklearn.metrics.roc_curve

Reference site: Generate confusion matrix with scikit-learn, calculate precision rate, recall rate, F1 value, etc. Calculate ROC curve and its AUC with scikit-learn

-** b. For regression problems ** --b-1. Mean squared error
Class to use: sklearn.metrics.mean_squared_error --b-2. Average absolute error
Class to use: sklearn.metrics.mean_absolute_error --b-3. Coefficient of determination
Class to use: sklearn.metrics.r2_score

Reference site: [Evaluate the results of the regression model with scikit-learn](https://pythondatascience.plavox.info/scikit-learn/%E5%9B%9E%E5%B8%B0%E3%83%A2% E3% 83% 87% E3% 83% AB% E3% 81% AE% E8% A9% 95% E4% BE% A1)

How to prevent overfitting

-** a. Hyperparameters ** --a-1. Grid search
Class to use: sklearn.grid_search.GridSearchCV --a-2. Random search
Class to use: sklearn.grid_search.RandomizedSearchCV

Reference site: Let's tune the model hyperparameters with scikit-learn!

-** b. Data (learning data & verification data) division ** --b-1. Holdout method
Class to use: sklearn.model_selection.train_test_split --b-2. Cross-validation method
Class to use: sklearn.model_selection.cross_val_score`` sklearn.model_selection.KFold --b-3. Leave one-out method <br> Class to use:sklearn.model_selection.LeaveOneOut`

Reference site: [About the method of dividing learning data and test data in machine learning and deep learning](https://newtechnologylifestyle.net/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7 % BF% 92% E3% 80% 81% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83 % 8B% E3% 83% B3% E3% 82% B0% E3% 81% A7% E3% 81% AE% E5% AD% A6% E7% BF% 92% E3% 83% 87% E3% 83% BC % E3% 82% BF% E3% 81% A8 /)

-** c. Regularization ** --c-1. Ridge regression
Class to use: sklearn.linear_model.Ridge --c-2. Return to Rosso
Class to use: sklearn.linear_model.Lasso

Reference site: Explanation of ridge regression and lasso regression in the shortest time (learning of machine learning # 3)

"Unsupervised learning"

Mecha Zackli: Unlike supervised learning, there are no objective variables. Here, the structure of the feature data is extracted by transforming it into another shape or finding a subset. Techniques include dimensionality reduction and clustering.

① Principal component analysis (PCA)

Summarize a large number of quantitative explanatory variables into fewer indicators and synthetic variables to reduce the variables in the data.

--Class to use: sklearn.decomposition.PCA --Keywords: Covariance matrix, eigenvalue problem, cumulative contribution rate --Reference site: Principal component analysis and eigenvalue problem

② K-means method (K-means method)

Classify the data into a given number of clusters and divide similar ones into groups.

--Class to use: sklearn.cluster.KMeans --Case: Marketing data analysis, image classification --Keywords: sum of squares in cluster, elbow method, silhouette analysis, k-means ++, k-medoids method --Reference site: How to find the optimum number of clusters for k-means

③ Latent Semantics (LSA)

In sentence data, the similarity between words and sentences is obtained by reducing the feature amount from the number of words to the number of latent topics.

--Class to use: sklearn.decomposition.TruncatedSVD --Keywords: Singular value decomposition, topic model, tf-idf --Reference site: Machine Learning Latent Semantics Theory

④ Non-negative matrix factorization (NMF)

A dimension reduction method that has the property that all I / O data values are non-negative.

--Class to use: sklearn.decomposition.NMF --Case: Recommendation, text mining --Reference site: Understanding non-negative matrix factorization (NMF) softly

⑤ Latent Dirichlet Allocation Method (LDA)

Create a topic from the words in the document and ask which topic the document consists of.

--Class to use: sklearn.decomposition.LatentDirichletAllocation --Case: Natural language processing --Keywords: Topic model, Dirichlet distribution --Reference site: Explanation of points that are difficult for beginners to understand in the topic model (LDA)

⑥ Mixed Gaussian distribution (GMM)

Clustering is performed by linear combination of multiple Gaussian distributions.

--Class to use: sklearn.mixture.GaussianMixture --Keyword: Gaussian distribution

⑦ Local linear embedding (LLE)

Dimensionality reduction is performed for non-linear data.

--Class to use: sklearn.manifold.LocallyLinearEmbedding

⑧ t-distributed stochastic neighborhood embedding method (t-SNE)

It is a method of reducing high-dimensional data to two or three dimensions, and is used for data visualization.

--Class to use: sklearn.manifold.TSNE