Introduction

① https://qiita.com/yohiro/items/04984927d0b455700cd1 ② https://qiita.com/yohiro/items/5aab5d28aef57ccbb19c ③ https://qiita.com/yohiro/items/cc9bc2631c0306f813b5 ④ https://qiita.com/yohiro/items/d376f44fe66831599d0b ⑤ https://qiita.com/yohiro/items/3abaf7b610fbcaa01b9c ⑥ https://qiita.com/yohiro/items/e9e80183e635e0ac4894 Continued

--Reference materials: Udemy Everyone's AI course Artificial intelligence and machine learning learned from scratch with Python --Library used: scikit-learn

Issue setting

Given the rate of increase in stock prices over the past four days, we can guess whether the stock price on that day will rise or fall.

Sample data

`stock_price.txt`

The closing price of the stock price for one day is stored in one line.

Source code

import

from sklearn import svm

File reading

Read sample data and store it in stock_data

#File reading
stock_data = []
stock_data_file = open("stock_price.txt", "r")
for line in stock_data_file:
    line = line.rstrip()
    stock_data.append(float(line))
stock_data_file.close()

Creating training data

Creation of increase rate data for each day

The rate of increase is calculated as follows

Rate of increase on day i=\frac{Stock price on day i- (i-1)Stock price of the day}{(i-1)Stock price of the day}

Store the data calculated above in modified_data.

count_s = len(stock_data)
modified_data = []
for i in range(1, count_s):
    modified_data.append(float(stock_data[i] - stock_data[i-1]) / float(stock_data[i-1]) * 20)
count_m = len(modified_data)

Creation of increase rate for 4 days and increase / decrease data (= correct answer value) for the day

For each day, store the rate of increase for the past 4 days in successive_data. Also, whether it went up or down on that day is stored in ʻanswers`.

#Data for 4 days up to the previous day
successive_data = []
#Correct answer Price increase: 1 Price decrease:0
answers = []
for i in range(4, count_m):
    successive_data.append([modified_data[i-4], modified_data[i-3], modified_data[i-2], modified_data[i-1]])
    if modified_data[i] > 0:
        answers.append(1)
    else:
        answers.append(0)
n = len(successive_data)
m = len(answers)

Training and forecasting

Train on 75% of the data.

#Linear support vector machine
clf = svm.LinearSVC()
#Training with support vector machine (75 of data)%Used for training)
clf.fit(successive_data[:int(n*75/100)], answers[:int(n*75/100)])

Confirmation of training results

Make predictions on the remaining 25% of the data.

#Test data
#Correct answer
expected = answers[int(-n*25/100):]
#Forecast
predicted = clf.predict(successive_data[int(-n*25/100):])

#Compare the last 10
print(expected[-10:])
print(list(predicted[-10:]))

#Calculation of correct answer rate
correct = 0.0
wrong = 0.0
for i in range(int(n*25/100)):
    if expected[i] == predicted[i]:
        correct += 1
    else:
        wrong += 1
print("Correct answer rate:" + str(correct/(correct+wrong) * 100) + "%")

result

In the teaching material video, the correct answer rate is about 61%, but When I actually try it, a warning appears, and the correct answer rate is about 50% (almost no difference if I guess it) ...

Will there be a difference in the results because the versions of python etc. are different?

C:\Anaconda3\lib\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
[0, 0, 0, 1, 0, 0, 0, 1, 1, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 1, 1]
Correct answer rate: 55.62248995983936%

bonus

Visualize sample data

Python & Machine Learning Study Memo ⑦: Stock Price Forecast