――This article is a memorandum article for elementary school students who are self-taught in python, machine learning, etc. ――It will be extremely simple, "study while copying the code that you are interested in". ――We would appreciate your constructive comments (LGTM & stock if you like it).
Today's topic is a video on Youtube called ** Build a Stock Prediction Program **.
Youtube: Build a Stock Prediction Program
The analysis used Google Colaboratry as shown in the youtube video.
Then I would like to do it.
First, import the library. This time we will use a library called quandl
. quandl
seems to be a library for fetching stock price and other data (I didn't know ...).
pip install quandl
Since it is not the library originally included in Google colab, install it with pip
.
Next, import the library to be used this time.
import quandl
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
In addition to the quandl I installed earlier, I imported numpy and various scikit-learn
libraries.
Then get the data. This time we are using Facebook stock price.
df = quandl.get('WIKI/FB')
print(df.head())
You have now obtained the data. Since ʻAdj. Close` (adjusted closing price) is used in the acquired data, replace df.
df = df[['Adj. Close']]
print(df.head())
Shift the data contained in the df by a few days to create another column (Prediction
). At that time, "how many days to shift" is stored as a variable.
forecast_out = 30
df['Prediction'] = df[['Adj. Close']].shift(-forecast_out)
print(df.tail())
If you look at the end of df, you can see that the value of Prediction
is NaN by the number of days shifted.
Next, create training data from df ['Predcition']
. For the training data, we will use the part of the data created by shifting the data for several days (30 days this time), excluding NaN. I'm using it to predict the 30 days of shift.
X = np.array(df.drop(['Prediction'], 1))
X = X[:-forecast_out]![stockprediction.png]
print(X)
【image】
Next, create test data. The method is the same as the training data.
y = np.array(df['Prediction'])
y = y[:-forecast_out]
print(y)
The data is now processed. Next, we will move on to analysis using scikit-learn.
Divide the training data (X) and test data (y) by train_test_split of sklearn.
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
This time, we will use SVM and Linear Regression for prediction.
#SVM rbf(Nonlinear regression)
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_rbf.fit(x_train, y_train)
svm_confidence = svr_rbf.score(x_test, y_test)
print('svm confidence:', svm_confidence)
lr = LinearRegression()
lr.fit(x_train, y_train)
lr_confidence = lr.score(x_test, y_test)
print('lr confidence:', lr_confidence)
You have now created a trained model using the training data.
Let's use it to make predictions
x_forecast = np.array(df.drop(['Prediction'], 1))
print(x_forecast)
lr_prediction = lr.predict(x_forecast)
print(lr_prediction)
svm_prediction = svr_rbf.predict(x_forecast)
print(svm_prediction)
This completes the prediction with the created model.
――As you understand, the above prediction is meaningless. I understand that studying how to use sklearn is the meaning of this sutra copying. ――Many papers have been published on forecasting stock prices and commodities, and time series analysis is very profound, so I would like to continue learning.
that's all.
(Learning so far)