Python scikit-learn A collection of predictive model tips often used in the field

conditions
1. 1. Data, features

・ Uses stock price data for one year of 2019 for a certain entertainment stock ・ Uses the Nikkei 225 inverse index for the same period ・ Does not mention the validation method whether it is the optimum combination of features.

2. model

・ For the purpose of implementation method, do not pursue parameter tuning for evaluation indexes such as insufficient learning, overfitting, and accuracy of predicted values.


Support vector regression

    1. Linear regression See the correlation between volume and stock price ・ Check the slope of the regression line and the SVR boundary line ・ Check the distribution within the margin ・ Check the mean square error of linear regression and SVR regression
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)

#Feature value(Volume)
x = npArray[:,2:3]

#Forecast data (stock price)
y = npArray[:, 3:4].ravel()

#Divided into training data and evaluation data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)#, random_state=0)

#Standardization of features
sc = StandardScaler()

#Training data standardized by transducer
x_train_std = sc.fit_transform(x_train)
#Standardize test data with a transducer trained with training data
x_test_std = sc.transform(x_test)

#Create a linear regression model
mod = LinearRegression()
#Create SVR model
mod2 = SVR(kernel='linear', C=10000.0, epsilon=250.0)

#Linear regression model learning
mod.fit(x_train_std, y_train)
# SVR
mod2.fit(x_train_std, y_train)

#Training data (volume) plot
plt.figure(figsize=(8,5))
#Volume sorting (0 between minimum and maximum values).1 time ndarray creation)
x_ndar = np.arange(x_train_std.min(), x_train_std.max(), 0.1)[:, np.newaxis]

#Linear regression prediction of volume
y_ndar_prd = mod.predict(x_ndar)
#Volume SVR Forecast
y_ndar_svr = mod2.predict(x_ndar)

## MSE(Mean squared error)
mse_train_lin=mod.predict(x_train_std)
mse_test_lin=mod.predict(x_test_std)
mse_train_svr= mod2.predict(x_train_std)
mse_test_svr  = mod2.predict(x_test_std)
#Linear regression MSE
print('Linear regression MSE training= %.1f,test= %.1f' % (mean_squared_error(y_train,mse_train_lin),mean_squared_error(y_test, mse_test_lin)))
#SVR MSE
print('SVRMSE training= %.1f,test=  %.1f' % (mean_squared_error(y_train,mse_train_svr),mean_squared_error(y_test, mse_test_svr)))

If you try several times without specifying random_state, the MSE of SVR is naturally small. 1st time MSE training for linear regression = 38153.4, test = 33161.9 SVR MSE training = 52439.9, test = 56707.7 Second time MSE training for linear regression = 37836.4, test = 33841.3 SVR MSE training = 54044.5, test = 51083.7 3rd time MSE training for linear regression = 37381.3, test = 35616.6 SVR MSE training = 53499.2, test = 53619.4

Let's plot this on a scatter plot below

#Scatter plot of volume and stock price
plt.scatter(x_train_std, y_train, color='blue', label='data')
#Regression line
plt.plot(x_ndar, y_ndar_prd, color='green', linestyle='-', label='LinearRegression')
#border
plt.plot(x_ndar, y_ndar_svr ,color='red', linestyle='-', label='SVR')
#Margin line
plt.plot(x_ndar, y_ndar_svr + mod2.epsilon, color='orange', linestyle='-.', label='margin')
plt.plot(x_ndar, y_ndar_svr - mod2.epsilon, color='orange', linestyle='-.')
#label
plt.ylabel('Closing price')
plt.xlabel('Volume')
plt.title('SVR Regression')
#Usage Guide
plt.legend(loc='lower right')

plt.show()

image.png

The SVR boundary is gentler than the slope of the regression line I tried to set the margin at 250 yen for epsilon, but it seems good to say that the stock price has not been noticeably thrown according to the trading volume and it is generally on an upward trend.

Recommended Posts

Python scikit-learn A collection of predictive model tips often used in the field
A collection of Numpy, Pandas Tips that are often used in the field
A collection of code often used in personal Python
A collection of Excel operations often used in Python
Test & Debug Tips: Create a file of the specified size in Python
Get the caller of a function in Python
Make a copy of the list in Python
Output in the form of a python array
A memorandum of method often used in machine learning using scikit-learn (for beginners)
A collection of commands frequently used in server management
A reminder about the implementation of recommendations in Python
Find out the apparent width of a string in python
Commands often used in the development environment during Python implementation
The story of a Django model field disappearing from a class
Get the number of specific elements in a python list
[Note] Import of a file in the parent directory in Python
[Tips] Problems and solutions in the development of python + kivy
Find the eigenvalues of a real symmetric matrix in Python
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Python] A memo of frequently used phrases (by myself) in Python scripts
How to determine the existence of a selenium element in Python
Read the standard output of a subprocess line by line in Python
How to check the memory size of a dictionary in Python
A function that measures the processing time of a method in python
Get the number of readers of a treatise on Mendeley in Python
Generate a first class collection in Python
Get a capture of the entire web page in Selenium Python VBA
If you want a singleton in python, think of the module as a singleton
Summary of methods often used in pandas
Write the test in a python docstring
Display a list of alphabets in Python 3
Use a scikit-learn model trained in PySpark
Check the in-memory bytes of a floating point number float in Python
Sum of variables in a mathematical model
Run the Python interpreter in a script
The result of installing python in Anaconda
[python] [meta] Is the type of python a type?
The basics of running NoxPlayer in Python
Receive a list of the results of parallel processing in Python with starmap
In search of the fastest FizzBuzz in Python
[Introduction to Python] Thorough explanation of the character string type used in Python!
The story of blackjack A processing (python)
Get a datetime instance at any time of the day in Python
I made a program to check the size of a file in Python
How to get the "name" of a field whose value is limited by the choice attribute in Django's model
Various ways to read the last line of a csv file in Python
How to pass the execution result of a shell command in a list in Python
How to get a list of files in the same directory with python
Output the number of CPU cores in Python
Draw a graph of a quadratic function in Python
[Python] Get the files in a folder with Python
[Python] Sort the list of pathlib.Path in natural sort
Match the distribution of each group in Python
Why the Python implementation of ISUCON 5 used Bottle
Techniques often used in python short coding (Notepad)
Find the number of days in a month
Rewriting elements in a loop of lists (Python)
Tips for hitting the ATND API in Python
A note about the python version of python virtualenv
Find the divisor of the value entered in python
Fix the argument of the function used in map