University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (1)

University of Tsukuba: Machine Learning Course

https://ocw.tsukuba.ac.jp/course/systeminformation/machine_learning/ This course is very easy to understand because it is open to the public that was actually held for students at the University of Tsukuba. There is a problem of analyzing using Python frequently in the subject of this course, but since the program part of this course is out of scope, there is no particular explanation and the program itself is not provided. The source code can be glimpsed in the video, but it's not neat because I can't move it by myself. So I decided to make a program that would give the same result.

Writer's level

I know Python and Pandas, but what is scikit-learn? I'm starting from that level. However, since I used the least squares method when I was in college, there is no problem with mathematical assumptions (although I forgot the calculation part of the matrix).

environment

It runs on Docker so that it can be used anytime, anywhere. Since matplot cannot be displayed as it is, it is dropped to png. If you drop it from this github, the environment will be ready. https://github.com/legacyworld/sklearn-basic The following .devcontainer is required for Remote Development with VS code.

Challenge 1.4 Regression of wine data

Exercise 1.4 is the first program. The explanation is from about 49 minutes of the 2nd (1) multiple regression. I explained from around 43 minutes, but the result is not correct because scaling is not done in the program.

python:Homework_1.4.py


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing

#scikit-Import wine data from learn
df= pd.read_csv('winequality-red.csv',sep=';')
#Since the target value quality is included, create a dropped dataframe
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
scaler = preprocessing.StandardScaler)
#Simple regression performed for each column
for column in df1:
    x = df[column]
    fig = plt.figure()
    plt.xlabel(column)
    plt.ylabel('quality')
    plt.scatter(x,y)
    #Convert to matrix
    X = x.values.reshape(-1,1)
    #scaling
    X_fit = scaler.fit_transform(X)
    model = linear_model.LinearRegression()
    model.fit(X_fit,y)
    plt.plot(x,model.predict(X_fit))
    mse = mean_squared_error(model.predict(X_fit),y)
    print(f"quality = {model.coef_[0][0]} * {column} + {model.intercept_[0]}")
    print(f"MSE: {mse}")
    filename = f"{column}.png "
    fig.savefig(filename)

#Multiple regression
X = df1.values
X_fit = scaler.fit_transform(X)
model = linear_model.LinearRegression()
model.fit(X_fit,y)
print(model.coef_,model.intercept_)

The place where you will surely get stuck if you use sklearn is the place to make a matrix. You will definitely see this error.

ValueError: Expected 2D array, got 1D array instead:
array=[7.4 7.8 7.8 ... 6.3 5.9 6. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Let's see what the problem is.

X = x.values.reshape(-1,1)
print(f"Before conversion{x.values}")
print(f"After conversion{X}")

Then, the output will be as follows.

Before conversion[ 9.4  9.8  9.8 ... 11.  10.2 11. ]
After conversion[[ 9.4]
 [ 9.8]
 [ 9.8]
 ...
 [11. ]
 [10.2]
 [11. ]]

It has changed to a two-dimensional array. After that, the calculation is actually performed and the result is obtained. The graph is saved as PNG with the name of the feature (fixed acidity.png etc.) fixed acidity.png

The last multiple regression part is added for my own study.

The following is the execution result

[root@316e28b88f45 workspace]# python test.py
quality = 0.10014898994431619 * fixed acidity + 5.6360225140712945
MSE: 0.6417307196439609
quality = -0.3153038874367112 * volatile acidity + 5.6360225140712945
MSE: 0.5523439983981253
quality = 0.18275435128971876 * citric acid + 5.6360225140712945
MSE: 0.6183613869155018
quality = 0.0110857825729839 * residual sugar + 5.6360225140712945
MSE: 0.6516376452555722
quality = -0.10406844138289646 * chlorides + 5.6360225140712945
MSE: 0.6409302993389623
quality = -0.04089548993375638 * free sulfur dioxide + 5.6360225140712945
MSE: 0.6500880987339057
quality = -0.14943458718129748 * total sulfur dioxide + 5.6360225140712945
MSE: 0.6294298439847829
quality = -0.14121524469500035 * density + 5.636022514071298
MSE: 0.6318187944965589
quality = -0.046607526450713255 * pH + 5.6360225140712945
MSE: 0.6495882783089737
quality = 0.20295710475205553 * sulphates + 5.6360225140712945
MSE: 0.6105689534614908
quality = 0.3844171096080022 * alcohol + 5.6360225140712945
MSE: 0.503984025671457
[[ 0.04349735 -0.19396667 -0.03555254  0.02301871 -0.08818339  0.04560596
  -0.10735582 -0.03373717 -0.06384247  0.1552765   0.29424288]] [5.63602251]

The intercept is the same regardless of which data is used. Since it is an estimated value when there is no feature (= x), it is a simple average of quality. from this result

--Model that gives the best prediction = Model with the lowest MSE = alcohol --The most positive effect on quality prediction = alcohol (0.384) ――It's not always good if the alcohol content is high, but ... --The most negative effect on quality prediction = volatile acidity (-0.315) ――Volatile acidity is a volatile acid that adversely affects wine. - http://www.worldfinewines.com/winefaults2.html

It was a little task to study wine.

Recommended Posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (17)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (5)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (16)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (10)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (2)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (4)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (12)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (1)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (11)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (3)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (14)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (6)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (15)
University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method
University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method
Python & Machine Learning Study Memo ⑤: Classification of irises
Python & Machine Learning Study Memo ②: Introduction of Library
Image collection Python script for creating datasets for machine learning
Summary of the basic flow of machine learning with Python
The result of Java engineers learning machine learning in Python www
[Machine learning pictorial book] A memo when performing the Python exercise at the end of the book while checking the data
Python learning memo for machine learning by Chainer until the end of Chapter 2
Python & Machine Learning Study Memo: Environment Preparation
Learning notes from the beginning of Python 1
I installed Python 3.5.1 to study machine learning
Python Basic Course (at the end of 15)
Python & Machine Learning Study Memo ③: Neural Network
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Learning notes from the beginning of Python 2
Python & Machine Learning Study Memo ⑥: Number Recognition
Align the number of samples between classes of data for machine learning with Python
Machine learning memo of a fledgling engineer Part 1
[Python] Read the source code of Bottle Part 2
Classification of guitar images by machine learning Part 1
The story of low learning costs for Python
2016 The University of Tokyo Mathematics Solved with Python
Upgrade the Azure Machine Learning SDK for Python
EV3 x Python Machine Learning Part 2 Linear Regression
[Python] Read the source code of Bottle Part 1
About the development contents of machine learning (Example)
Machine learning memo of a fledgling engineer Part 2
Classification of guitar images by machine learning Part 2
Get a glimpse of machine learning in Python
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
[Python + OpenCV] Whiten the transparent part of the image
Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
The first step of machine learning ~ For those who want to implement with python ~
[CodeIQ] I wrote the probability distribution of dice (from CodeIQ math course for machine learning [probability distribution])