[PYTHON] Super introduction to machine learning

Overview

The theme is "What is machine learning? How do you use it?" For in-house study sessions. I hope the content of this article is useful to others.

Machine learning = AI?

Machine learning is a field of artificial intelligence, and deep learning is a field of machine learning.

Picture1.png

Rule base

A program that covers various patterns by multiple If statements and exploration so that appropriate output can be obtained even under complicated conditions.

Machine learning

Learns data patterns and features and outputs some predictions for unknown data based on it

Deep learning

One of the machine learning methods that can automatically select the elements that characterize the data

Reinforcement learning

In a certain environment, the agent repeatedly tries to act while observing the situation and learns the optimal decision making to achieve the purpose.

Point! With rule base, when an exception occurs, it is necessary for a person to manually rewrite the rule, and it is difficult to respond when the data increases steadily. ** → In machine learning, let the computer do it! ** **

Types of machine learning

Supervised learning can be broadly divided into regression and classification. Regression: The prediction result is numerical. What is Japan's GDP in 2018? → Regression Classification: The result of the prediction is a class. Is this flower an iris or a young iris? → Classification

What does machine learning do?

flow.png

1. Decide what to do

Decide what you want to judge and how accurate you want it to be.

2. Collect data

Collect the data necessary for forecasting and judgment. You can use the data already stored in the DB or get it from the WEB. The collected data is divided into "learning data" and "test data".

Crawling

It is a technology to download WEB page data based on the URL. Crawling example using python requests:

crawling.py


import requests
r = requests.get('https://ja.wikipedia.org/wiki/Python')
r.text

image.png

Scraping

Technology to extract and process necessary information from downloaded WEB pages Example of scraping using BeautifulSoup of python:

scraping.py


from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content, 'html.parser')
soup.find(class_='mw-redirect').string
>>> 'Multi-paradigm'

image.png

Get with API

Obtained using the RESTful API published by each service Example of acquiring data with GitHub API and processing it with pandas:

get_github_data.py


import requests
import pandas as pd

git_res = requests.get('https://api.github.com/search/repositories?q=language:python+created:2017-07-28&per_page=3')
pd.DataFrame(git_res.json()['items'])[:][['language', 'stargazers_count', 'git_url', 'updated_at', 'created_at']]

image.png

3. Format the data

Format the collected data. How to format it depends on the type of data and the subsequent modeling work.

Missing value interpolation

Data that is missing in the features is filled with the average value or 0, and the data is interpolated. It also performs processing such as replacing the characters listed as categories with flags (dummy variable conversion).

ffill.png

trimming

If you want to identify a specific character from the image data, trim it or create annotation data.

Morphological analysis

Sentences, etc. are converted into word-separated writing by morphological analysis, and further vector-converted so that they can be handled as numerical values.

janome example

Installation

pip install janome

Word-separation

janome_test.py


# -*- coding: utf-8 -*-
from janome.tokenizer import Tokenizer
t = Tokenizer()

document = u'This is test data'
tokens = t.tokenize(document)
for token in tokens:
    print(token.surface)

output

this
Is
test
data
is

4. Make a model and learn

A model is for converting input data (prediction / judgment factors) to output data (prediction / judgment results). Roughly speaking, a function.

procedure

From the acquired data, organize and analyze the structure and correlation of the data that are likely to be factors in the prediction results, and create a model with a certain degree of freedom. Determine the parameters from the training data and create a prediction model.

image image.png

Library

This is a part that requires specialized knowledge and experience, but there are libraries that can be created to some extent easily.

Example using scikit-learn: Select Model (this time linear regression) that is good for the problem, fit the training data to the model, and select the trained model. Created.

liner_reg_sample.py


import numpy as np
from sklearn import linear_model

#Assuming collected data
x_data = np.arange(-3, 10, 0.1).reshape(-1, 1)
y_data = (1/2) * x_data + np.random.normal(0.0, 0.5, len(x_data)).reshape(-1, 1)

#Use as learning data
x_train = x_data[70:]
y_train = y_data[70:]

#Fit the model to the training data
reg = linear_model.LinearRegression()
reg.fit(x_train, y_train)

There is a train_test_split to separate the training data from the test data. (I didn't use it this time to make it easier to read chapter by chapter.)

Hyperparameters

Some models have hyperparameters that need to be manually determined. This is not determined by learning. (Example: number of DL layers, number of learnings, etc.)

How to determine hyperparameters

Other

I previously posted an article that outlines machine learning techniques. I hope it will be a hint for model building. Roughly organize Qiita machine learning information centered on methods

5. Predicted by test data

Use the created model and make predictions with test data.

An example using scikit-learn (continuation of the above code):

liner_reg_sample.py


#test data
x_test = x_data[:71]
y_test = y_data[:71]

#Forecast
pred = reg.predict(x_test)

#Coefficient of determination
print('score:', reg.score(x_test, y_test))
>>> score: 0.714080213722

6. Verification

Verify how accurate the predictions in the test data are. Validate with the evaluation scale suitable for each model.

Evaluation scale

Classification

Output these to scikit-learn accuracy_score and classification_report /stable/modules/generated/sklearn.metrics.classification_report.html).

Regression

Output these to scikit-learn mean_absolute_error and mean_squared_error /stable/modules/generated/sklearn.metrics.mean_squared_error.html).

Verification method

An example using scikit-learn (continuation of the above code):

liner_reg_sample.py


from sklearn.metrics import mean_squared_error
from math import sqrt

#Correlation coefficient
print('corr:', np.corrcoef(y_test.reshape(1, -1), pred.reshape(1, -1))[0, 1])

# RMSE
print('RMSE:', sqrt(mean_squared_error(y_test, pred)))
>>> corr: 0.895912443712
>>> RMSE: 0.6605862235679646

If it can be visualized, graph it and check it visually.

liner_reg_sample.py


plt.scatter(x_test, y_test, color='blue')
plt.plot(x_test, pred, color='red')
plt.show()

ダウンロード.png

Reporting

Record what kind of modeling was done, what test data was used, and how accurate it was. Since I am writing in python, I write it in Markdown in Jupyter notebook.

Qiita: Various summary to use Jupyter Notebook more conveniently

7. Back

If the accuracy required in the verification is not obtained, sort out what went wrong and follow the procedure "2. Collect data" or "3. Shape the data" or "4. Create a model and learn" Return to. Rotate this cycle around.

keyword

Overfitting

It adapts excessively to the training data, and the accuracy of prediction for unknown data becomes low.

Specialized AI / general-purpose AI

Specialized AI is AI that can be used in a specific field. General-purpose AI is AI that can be used in many different fields. (It's like Tetsuwan A * mu) Most AI is specialized AI.

end

Since the study session was only an overview, I posted Qiita with the hope that I could share knowledge with a little information added. I would be grateful if you could point out any mistakes.

Recommended Posts

Super introduction to machine learning
Introduction to machine learning
An introduction to machine learning
[Super Introduction to Machine Learning] Learn Pytorch tutorials
[Super Introduction to Machine Learning] Learn Pytorch tutorials
Introduction to machine learning Note writing
Introduction to Machine Learning Library SHOGUN
Introduction to Machine Learning: How Models Work
Introduction to ClearML-Easy to manage machine learning experiments-
An introduction to Python for machine learning
[Python] Easy introduction to machine learning with python (SVM)
An introduction to machine learning for bot developers
[For beginners] Introduction to vectorization in machine learning
[Learning memorandum] Introduction to vim
A super introduction to Linux
Introduction to Deep Learning ~ Learning Rules ~
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
Machine learning
An introduction to machine learning from a simple perceptron
[Super Introduction] Machine learning using Python-From environment construction to implementation of simple perceptron-
Introduction to Deep Learning ~ Function Approximation ~
Introduction to Deep Learning ~ Coding Preparation ~
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
How to collect machine learning data
Introduction to Machine Learning with scikit-learn-From data acquisition to parameter optimization
Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
Introduction to MQTT (Introduction)
Introduction to Scrapy (1)
Introduction to Scrapy (3)
[Introduction] Reinforcement learning
Introduction to Supervisor
scikit-learn How to use summary (machine learning)
Introduction to Tkinter 1: Introduction
pytorch super introduction
Record the steps to understand machine learning
A super introduction to Python bit operations
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
I installed Python 3.5.1 to study machine learning
Introduction to Deep Learning ~ Convolution and Pooling ~
Before the introduction to machine learning. ~ Technology required for machine learning other than machine learning ~
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Introduction to PyQt
Introduction to Scrapy (2)
[Linux] Introduction to Linux
Introduction to Scrapy (4)
Introduction to discord.py (2)
[Memo] Machine learning
Machine learning classification
Introduction to Machine Learning-Hard Margin SVM Edition-
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Introduction to TensorFlow-Machine Learning Terminology / Concept Explanation
Introduction to discord.py
Machine Learning sample
Linux commands to remember
A super introduction to Linux
Easy to make with syntax
Super introduction to machine learning
Take the free "Introduction to Python for Machine Learning" online until 4/27 application
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Try to forecast power demand by machine learning