Get a glimpse of machine learning in Python

I wonder what machine learning is, and if I write it as I can think of it, it looks like this.

--Classification --Regression --Discrimination

There may be others, but I can't think of them. When I recommend a tutorial around here in R, I use iris data, but I wonder if it is in Python. → There seems to be.

Machine learning with R

Example of classification & discrimination

First, try R. Making a classifier using SVM, NeuralNet, NaiveBayse, RandomForest, etc. in R is almost nothing to think about. Using each clustering method, a training set is used to create a classifier, and the test set is evaluated to see if it can be correctly classified by the classifier.

In R, if you do data (iris), you can use the data of iris, iris, or iris.

20160903001.png

This is a data.table of all 150 samples with calyx length, calyx width, petal length, petal width as explanatory variables and iris varieties (setosa, versicolor, virginica) as objective variables.

First, use sample () to randomly divide half of the training set and the rest into the test set.

##Reading iris data
data(iris)
##Randomly choose to make half a training set
train_ids <- sample(nrow(iris), nrow(iris)*0.5)
##Creating a training set
iris.train <- iris[train_ids,]
##Use the other half as a test set
iris.test  <- iris[-train_ids,]

By the way, the three iris varieties, setosa, versicolor and virignica, have the following flowers. Everyone is the same! It's a punishment game, such as distinguishing this by the shape of the petals.

20160903004.png

###SVM execution
library(kernlab)
iris.svm <- ksvm(Species~., data=iris.train)
svm.predict <- predict(iris.svm, iris.test)
###Result display
table(svm.predict, iris.test$Species)

###run neuralnet
library(nnet)
iris.nnet<-nnet(Species ~ ., data = iris.train, size = 3)
nnet.predict <- predict(iris.nnet, iris.test, type="class")
###Result display
table( nnet.predict,  iris.test$Species)

###run naive bayes
library(e1071)
iris.nb <- naiveBayes(Species~., iris.train)
nbayes.predict <- predict(iris.nb, iris.test)
###Result display
table(nbayes.predict, iris.test$Species)

###Random forest execution
library(randomForest)
iris.rf <- randomForest(Species~., iris.train)
rf.predict <- predict(iris.rf, iris.test)
###Result display
table(rf.predict, iris.test$Species)

When you try it, you will be able to identify it with a correct answer rate of about 73/75 no matter what method you use. It's not a hundred shots, but it may (or may not) be managed by adjusting parameters.

Now, to try the same thing in Python, use scikit-learn. More details can be found in scikit-learn tutorial.

Identified by Python

Try to be able to do SVM with Python

In the first place, why do you do "what you could easily do with R as above" with Python, which is unknown to you? I can only say that I want to try it in Python. If there is a mountain in Soko, I will climb it, if there is a puddle, I will be addicted to it, and if there is a set table, I will eat it.

If Anaconda is already installed, scikit-learn is already installed, so import it. I import it, but the library name is sklearn. Iris can also be loaded in this library as datasets.

from sklearn import svm, datasets
iris = datasets.load_iris()

As for the contents, you can tell by printing (iris.data) or print (iris.target), but iris.data has an explanatory variable and iris.target has an objective variable. Divide this into a training dataset and a test dataset. In R, I used the sample () function, but in scikit-learn, there is a method called train_test_split () in sklearn.cross_varidation. Now, as I did in R, I split half into a training set (iris_data_train, iris_target_train) and a test set (iris_data_test, iris_target_test).

from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
import numpy as np

iris = datasets.load_iris()
iris_data_train, iris_data_test, iris_target_train, iris_target_test = train_test_split(iris.data, iris.target, test_size=0.5)

Make a classifier using this training set. There are various SVMs, is it linear? Is it non-linear? I need to set various parameters, but this is all by default.

svc = svm.SVC()
svc.fit(iris_data_train, iris_target_train)
svc.predict(iris_data_test)

I wrote fit () and predict () in two lines, but I can just initialize them, feed the training data, and feed the test data, so I can write them in one line.

iris_predict = svm.SVC().fit(iris_data_train, iris_target_train).predict(iris_data_test)

20160903005.png

You can see the match rate between the result of this svc.predict () (iris_predict) and iris_target_test. In R, it is output in tabular format like table (svm.predict, iris.test $ Species), so try the same. This table is called confusion_matrix. With accuracy_score, you can quickly get the correct answer rate.

from sklearn.metrics import confusion_matrix, accuracy_score

print (confusion_matrix(iris_target_test, iris_predict))
print (accuracy_score(iris_target_test, iris_predict))

According to the official document, confusion_matrix () has the first argument as the true value and the second argument as the identification. It seems that it is a discriminant value by the vessel, so pass it as such.

20160903006.png

It's simply more typed than R, but I've come to get similar results.

According to the above document, this can also be illustrated in the figure. First, normalize each line of confusion_matrix so that it totals 1. I can't write this formula myself.

cm = confusion_matrix(iris_target_test, iris_predict)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

Using this normalized confusion_matrix, heatmap it using the function of matplotlib. By the way, since svc.predict () is done many times and the result is slightly different each time, the contents of confusion_matrix are slightly different between the above and below figures.

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
    ''' confusion_Function to display matrix as heatmap
    Keyword arguments:
        cm -- confusion_matrix
        title --Figure title
        cmap --Color map to use
        
    '''
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(iris.target_names))
    plt.xticks(tick_marks, iris.target_names, rotation=45)
    plt.yticks(tick_marks, iris.target_names)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

20160903007.png

This time, everything was default, such as svc = svm.SVC (), but of course you should be able to use various kernels for classification. The Examples page of scikit-learn's official website has various demos and tutorials, so if you need it, here are some tips. I think I can find it.

As I did in R, I'm wondering if NeuralNet is naive Bayes and random Forest, but maybe it's about the same (I haven't checked it).

This conclusion

I feel like I've touched a part of machine learning, but in the end I didn't know if iris was a iris, an iris, or a iris.

Postscript

Initially, the title was "Trying to touch machine learning with Python", but it was said that "touch is the most exciting part, so the title is not suitable", so I corrected it.

I personally think that "creating a discriminator with SVM in Python" is an exciting point, but it seems that he was selfish. I'm sorry to talk about machine learning like SVM. I will come back again.

This code

Recommended Posts

Get a glimpse of machine learning in Python
Get the caller of a function in Python
MALSS, a tool that supports machine learning in Python
Python: Preprocessing in Machine Learning: Overview
The result of Java engineers learning machine learning in Python www
Get the number of specific elements in a python list
How to get a list of built-in exceptions in python
A beginner's summary of Python machine learning is super concise.
Display a list of alphabets in Python 3
[python] Frequently used techniques in machine learning
How to get a stacktrace in python
Python: Preprocessing in machine learning: Data acquisition
Get a token for conoha in python
[Python] Saving learning results (models) in machine learning
Python: Preprocessing in machine learning: Data conversion
[Python] Get a list of folders only
Get rid of DICOM images in Python
Try to get a list of breaking news threads in Python.
How about Anaconda for building a machine learning environment in Python?
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Get a list of files in a folder with python without a path
Get the number of readers of a treatise on Mendeley in Python
Machine learning memo of a fledgling engineer Part 1
Draw a graph of a quadratic function in Python
Get a capture of the entire web page in Selenium Python VBA
[Python] Get the files in a folder with Python
Python & Machine Learning Study Memo ⑤: Classification of irises
Get a list of packages installed in your current environment with python
Python & Machine Learning Study Memo ②: Introduction of Library
Full disclosure of methods used in machine learning
Make a copy of the list in Python
Rewriting elements in a loop of lists (Python)
Summary of evaluation functions used in machine learning
Get date in Python
Make a joyplot-like plot of R in python
Output in the form of a python array
Machine learning memo of a fledgling engineer Part 2
A beginner of machine learning tried to predict Arima Kinen with python
Get a datetime instance at any time of the day in Python
A well-prepared record of data analysis in Python
Python: Preprocessing in machine learning: Handling of missing, outlier, and imbalanced data
Build a Python machine learning environment with a container
Become an AI engineer soon! Comprehensive learning of Python / AI / machine learning / deep learning / statistical analysis in a few days!
List of main probability distributions used in machine learning and statistics and code in python
Basic data frame operations written by beginners in a week of learning Python
A memorandum of method often used in machine learning using scikit-learn (for beginners)
How to get a list of files in the same directory with python
How to get the number of digits in Python
Run a machine learning pipeline with Cloud Dataflow (Python)
Tool MALSS (application) that supports machine learning in Python
A collection of code often used in personal Python
Build a machine learning Python environment on Mac OS
[python] Get the list of classes defined in the module
Tool MALSS (basic) that supports machine learning in Python
Installation of TensorFlow, a machine learning library from Google
About testing in the implementation of machine learning models
Get the size (number of elements) of UnionFind in Python
Basics of Python learning ~ What is a string literal? ~
BigQuery-If you get a Reason: responseTooLarge error in Python
Build a machine learning application development environment with Python