[PYTHON] [Machine learning] Regression analysis using scikit learn

Hello. This is Hayashi @ Ienter.

Last time, I talked about regression analysis of mathematical approach in Blog.

Recently, Python is increasingly used in the fields of statistical analysis and machine learning. Python has a powerful library of mathematical operations and data visualization. I think this is one of the reasons.

This time, I will introduce a python library called "scikit-learn" that is often used in machine learning. A very powerful library. In fact, let's perform a regression analysis of the sample data from the previous blog in the library.

■ Environment setup

Install the Python package "Anaconda". This is a package that allows you to install Python itself and libraries that are often used in science, technology, mathematics, and data analysis all at once. Windows / MacOS / Linux packages are available. I will install the Python version 3.5 this time.

スクリーンショット 2016-08-01 12.39.43.jpg

■ Starting / preparing for editing Jupyter Notebook

When you install Anaconda, there is also a library called "Jupyter Notebook" It will be installed. This time we will use this framework to proceed.

By the way, "Jupyter Notebook" is a framework that extends "IPython" that can execute python interactively in a command line environment to a browser environment.

For the time being, create a working directory "jupyter_work" and start "Jupyter Notebook" from there.

$ mkdir jupyter_work
$ cd jupyter_work
$ jupyter notebook

The browser will start and you will see a screen like the one below.

スクリーンショット 2016-08-01 13.12.04.png


To create a new Notebook, select the item "Python [Root]" from the "New" combo box on the left side of the screen. スクリーンショット 2016-08-01 13.08.58.png

Then, the following interactive input screen will be displayed. Now you are ready to code. スクリーンショット 2016-08-01 13.15.26.png


■ Basic operation of Jupyter Notebook

In the "In []:" input field, enter the python code. Of course, you can enter multiple lines with the return key. スクリーンショット 2016-08-01 13.29.35.png

To execute the input code, click "![Screenshot 2016-08-01 13.35.34.png](https://qiita-image-store.s3.amazonaws.com/0/134453/3f3fb05b-7f55-aa8a" on the toolbar. You can do this by pressing the -3a12-25c3158f37e4.png) button or by holding down the shift key and pressing the return key. スクリーンショット 2016-08-01 13.37.43.png


■ Try regression analysis

First, let's perform regression analysis using the simple data from the previous blog. The data was as follows. xy012345.png The result of the regression line was the following equation. resolved_func.png This time, we will also check the above analysis results with scikit-learn.


First, import the required libraries. Screenshot 2016-08-01 15.25.40.png

numpy is a numerical calculation library. matplotlib is a library for drawing graphs. pyplot provides a procedural interface to matplotlib's object-oriented library. pandas is a library that supports data analysis such as spreadsheets, and this time we will use a two-dimensional array called DataFrame (a function like an Excel table). In addition, sklearn is a machine learning library that uses the function of a linear regression model called linear_model. The last line "% matplotlib inline" is a command to draw a graph in the browser.


Prepare the previous X and Y data as DataFrame format data. In statistical terms, X is the explanatory variable and Y is the objective variable. スクリーンショット 2016-08-01 16.02.10.png


Next, create an instance of the linear regression model and execute the training process with the fit function. スクリーンショット 2016-08-01 16.06.15.png


In addition, prepare the prediction source data (px). px is an array in which the minimum value (0) and the maximum value (5) of the X data are split in 0.01 increments. スクリーンショット 2016-08-01 16.27.03.png

However, this time, px to be passed to the predictive function (predict) of linear_model needs to pass the following two-dimensional array due to the specifications of the function.

[[0.00],[0.01],[0.02],[0.03].....]

Here, [:, np.newaxis] is used to convert a one-dimensional array to a two-dimensional array.


Substitute the prediction source data px in the prediction function (predict) and store the prediction result in py. Plot that data on a graph. With plt.scatter (), plot the X and Y of the original data with red dots, and draw the result predicted with plt.plot () with the blue straight lines of px and py. スクリーンショット 2016-08-01 16.40.53.png


Make sure that the slope a of this drawn straight line and the intercept b of the Y axis are the expected values of 1.4 and 2.0, respectively. These values are stored in model.coef_ and model.intercept_. スクリーンショット 2016-08-01 16.47.53.png

You can confirm that the expected value is output.


■ Perform regression analysis with scikit learn sample data

Sample data for machine learning is also available in scikit learn. One of the samples is Boston's home price data. The original data seems to refer to the here site. Here is the code that is included in the data and is a regression analysis of the correlation between the number of rooms purchased and the price of the house.

Naturally, we can see that the house price tends to increase as the number of rooms increases.

スクリーンショット 2016-08-01 18.30.03.png スクリーンショット 2016-08-01 18.30.28.png


That's all for this story!

Recommended Posts

[Machine learning] Regression analysis using scikit learn
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
Machine learning with python (2) Simple regression analysis
Stock price forecast using machine learning (regression)
Machine learning logistic regression
Machine learning linear regression
Somehow learn machine learning
[Python3] Let's analyze data using machine learning! (Regression)
Machine Learning: Supervised --Linear Regression
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Understand machine learning ~ ridge regression ~.
Machine learning stacking template (regression)
Machine learning algorithm (logistic regression)
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Machine learning beginners try linear regression
Classification and regression in machine learning
Machine Learning: Supervised --Linear Discriminant Analysis
Application development using Azure Machine Learning
Machine learning
Machine learning algorithm (generalization of linear regression)
Stock price forecast using machine learning (scikit-learn)
[Machine learning] LDA topic classification using scikit-learn
[Machine learning] FX prediction using decision trees
[Machine learning] Supervised learning using kernel density estimation
<Course> Machine Learning Chapter 1: Linear Regression Model
[Python] First data analysis / machine learning (Kaggle)
<Course> Machine learning Chapter 4: Principal component analysis
Machine learning algorithm (linear regression summary & regularization)
Preprocessing in machine learning 1 Data analysis process
[Super Introduction to Machine Learning] Learn Pytorch tutorials
A story about simple machine learning using TensorFlow
Data supply tricks using deques in machine learning
[Machine learning] Supervised learning using kernel density estimation Part 2
EV3 x Python Machine Learning Part 2 Linear Regression
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
Poisson regression analysis
[Machine learning] Supervised learning using kernel density estimation Part 3
[Super Introduction to Machine Learning] Learn Pytorch tutorials
Analysis of shared space usage by machine learning
Regression analysis method
[Memo] Machine learning
Machine learning classification
A story about data analysis by machine learning
Machine Learning sample
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
What I learned about AI / machine learning using Python (1)
Explanation of the concept of regression analysis using python Part 2
Learn collaborative filtering along with Coursera Machine Learning materials
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Create machine learning projects at explosive speed using templates
Explanation of the concept of regression analysis using Python Part 1
What I learned about AI / machine learning using Python (3)
Machine Learning with Caffe -1-Category images using reference model
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Site summary to learn machine learning with English video
Tech-Circle Let's start application development using machine learning (self-study)
Explanation of the concept of regression analysis using Python Extra 1
[Machine learning] Text classification using Transformer model (Attention-based classifier)
Memo for building a machine learning environment using Python