[PYTHON] I briefly summarized what you should keep in mind when learning with or without supervised learning

As a memorandum, I will summarize the outline, classes, examples, keywords to be used, and the sites that were helpful for learning about "supervised learning" and "unsupervised learning".

"Supervised learning"

Mecha Zackri: A prediction model is created by giving training that represents the characteristics and the corresponding answer data. There are classification problems and regression problems in prediction.

Each method

① Linear regression

Find the parameter that has the smallest loss function (error function) value among all straight lines.

--Class to use: sklearn.linear_model.LinearRegression --Example: Relationship between the number of visitors and sales, etc. --Keywords: simple regression, multiple regression, polynomial regression, non-linear regression --Reference site: [Linear regression with scikit-learn (single regression analysis / multiple regression analysis)](https://pythondatascience.plavox.info/scikit-learn/%E7%B7%9A%E5%BD%A2%E5 % 9B% 9E% E5% B8% B0)

② Logistic regression

It is a binary classification algorithm and is applied to classification problems.

--Class to use: sklearn.linear_model.LogisticRegression --Example: Relationship between sales visits / satisfaction and sales, etc. --Keywords: sigmoid function, cross entropy error function --Reference site: Classification of iris by logistic regression of scikit-learn

③ SVM (linear)

An algorithm that learns the decision boundary (straight line) away from the data and can be used for both classification and regression.

--Class to use: sklearn.svm.SVC --Case: Text classification, number recognition, etc. --Keywords: hard margin, soft margin --Reference site: What is a support vector machine (SVM)? ~ From basic to Python implementation ~

④ SVM (kernel method)

After mapping the data in the real space to a space that can be separated by a hyperplane by the kernel function, the data set is separated.

--Class to use: sklearn.svm.SVC --Case example: Product identification from color information, etc. --Keywords: Kernel functions (sigmoid kernel, polynomial kernel, RBF [radial basis function] kernel) --Reference site: [Python] Implementing support vector machines using various kernel functions [iris dataset]

⑤ Naive Bayes

Under the assumption that each feature is independent, we calculate the probability that the data is a label.

--Class to use: sklearn.naive_bayes.MultinomialNB (Other GaussianNB, GaussianNB, etc.) --Case: Judgment of junk mail, etc. --Keyword: Smoothing --Reference site: Naive Bayes classifier by scikit-learn

⑥ Random forest

Collect output from multiple decision trees with diversity and produce classification results by majority vote.

--Class to use: sklearn.ensemble.RandomForestClassifier --Case: Classification by behavior history and attributes --Keywords: Gini coefficient, bootstrap method --Reference site: [Introduction] Decision tree analysis for beginners by beginners

⑦ Neural network

By sandwiching an intermediate layer between the input and the output, a complex decision boundary is learned.

--Class to use: sklearn.neural_network.MLPClassifier --Case: Image recognition, voice recognition --Keywords: simple perceptron, activation function, early stopping --Reference site: Let's make a neural network by yourself

⑧ k-nearest neighbor method

Judgment is made by majority voting of k classifications in the vicinity of the input data.

--Class to use: sklearn.neighbors.KNeighborsClassifier --Reference site: Machine learning ~ K-nearest neighbor method ~

Evaluation method

-** a. For classification problems ** --a-1. Confusion matrix
Class to use: sklearn.metrics.confusion_matrix --a-2. Correct answer rate
Class to use: sklearn.metrics.accuracy_score --a-3. Compliance rate
Class to use: sklearn.metrics.precision_score --a-4. Recall rate
Class to use: sklearn.metrics.recall_score --a-5. F value
Class to use: sklearn.metrics.f1_score

Reference site: Generate confusion matrix with scikit-learn, calculate precision rate, recall rate, F1 value, etc. Calculate ROC curve and its AUC with scikit-learn

-** b. For regression problems ** --b-1. Mean squared error
Class to use: sklearn.metrics.mean_squared_error --b-2. Average absolute error
Class to use: sklearn.metrics.mean_absolute_error --b-3. Coefficient of determination
Class to use: sklearn.metrics.r2_score

Reference site: [Evaluate the results of the regression model with scikit-learn](https://pythondatascience.plavox.info/scikit-learn/%E5%9B%9E%E5%B8%B0%E3%83%A2% E3% 83% 87% E3% 83% AB% E3% 81% AE% E8% A9% 95% E4% BE% A1)

How to prevent overfitting

-** a. Hyperparameters ** --a-1. Grid search
Class to use: sklearn.grid_search.GridSearchCV --a-2. Random search
Class to use: sklearn.grid_search.RandomizedSearchCV

Reference site: Let's tune the model hyperparameters with scikit-learn!

-** b. Data (learning data & verification data) division ** --b-1. Holdout method
Class to use: sklearn.model_selection.train_test_split --b-2. Cross-validation method
Class to use: sklearn.model_selection.cross_val_score`` sklearn.model_selection.KFold --b-3. Leave one-out method <br> Class to use:sklearn.model_selection.LeaveOneOut`

Reference site: [About the method of dividing learning data and test data in machine learning and deep learning](https://newtechnologylifestyle.net/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7 % BF% 92% E3% 80% 81% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83 % 8B% E3% 83% B3% E3% 82% B0% E3% 81% A7% E3% 81% AE% E5% AD% A6% E7% BF% 92% E3% 83% 87% E3% 83% BC % E3% 82% BF% E3% 81% A8 /)

-** c. Regularization ** --c-1. Ridge regression
Class to use: sklearn.linear_model.Ridge --c-2. Return to Rosso
Class to use: sklearn.linear_model.Lasso

Reference site: Explanation of ridge regression and lasso regression in the shortest time (learning of machine learning # 3)

"Unsupervised learning"

Mecha Zackli: Unlike supervised learning, there are no objective variables. Here, the structure of the feature data is extracted by transforming it into another shape or finding a subset. Techniques include dimensionality reduction and clustering.

① Principal component analysis (PCA)

Summarize a large number of quantitative explanatory variables into fewer indicators and synthetic variables to reduce the variables in the data.

--Class to use: sklearn.decomposition.PCA --Keywords: Covariance matrix, eigenvalue problem, cumulative contribution rate --Reference site: Principal component analysis and eigenvalue problem

② K-means method (K-means method)

Classify the data into a given number of clusters and divide similar ones into groups.

--Class to use: sklearn.cluster.KMeans --Case: Marketing data analysis, image classification --Keywords: sum of squares in cluster, elbow method, silhouette analysis, k-means ++, k-medoids method --Reference site: How to find the optimum number of clusters for k-means

③ Latent Semantics (LSA)

In sentence data, the similarity between words and sentences is obtained by reducing the feature amount from the number of words to the number of latent topics.

--Class to use: sklearn.decomposition.TruncatedSVD --Keywords: Singular value decomposition, topic model, tf-idf --Reference site: Machine Learning Latent Semantics Theory

④ Non-negative matrix factorization (NMF)

A dimension reduction method that has the property that all I / O data values are non-negative.

--Class to use: sklearn.decomposition.NMF --Case: Recommendation, text mining --Reference site: Understanding non-negative matrix factorization (NMF) softly

⑤ Latent Dirichlet Allocation Method (LDA)

Create a topic from the words in the document and ask which topic the document consists of.

--Class to use: sklearn.decomposition.LatentDirichletAllocation --Case: Natural language processing --Keywords: Topic model, Dirichlet distribution --Reference site: Explanation of points that are difficult for beginners to understand in the topic model (LDA)

⑥ Mixed Gaussian distribution (GMM)

Clustering is performed by linear combination of multiple Gaussian distributions.

--Class to use: sklearn.mixture.GaussianMixture --Keyword: Gaussian distribution

⑦ Local linear embedding (LLE)

Dimensionality reduction is performed for non-linear data.

--Class to use: sklearn.manifold.LocallyLinearEmbedding

⑧ t-distributed stochastic neighborhood embedding method (t-SNE)

It is a method of reducing high-dimensional data to two or three dimensions, and is used for data visualization.

--Class to use: sklearn.manifold.TSNE

Recommended Posts

I briefly summarized what you should keep in mind when learning with or without supervised learning
What should I do with DICOM in MPEG2?
Things to keep in mind when using Python with AtCoder
Things to keep in mind when using cgi with python.
Things to keep in mind when converting row vectors to column vectors with ndarray
I failed when clustering with k-means, but what should I do (implementation of kernel k-means)
What I did when I got stuck in the time limit with lambda python
Use Welch's t-test (should) with or without homoscedasticity
[Question] What happens when I use% in python?
What are you using when testing with Python?
What I did when I was angry to put it in with the enable-shared option