① https://qiita.com/yohiro/items/04984927d0b455700cd1 ② https://qiita.com/yohiro/items/5aab5d28aef57ccbb19c ③ https://qiita.com/yohiro/items/cc9bc2631c0306f813b5 ④ https://qiita.com/yohiro/items/d376f44fe66831599d0b ⑤ https://qiita.com/yohiro/items/3abaf7b610fbcaa01b9c ⑥ https://qiita.com/yohiro/items/e9e80183e635e0ac4894 Continued
--Reference materials: Udemy Everyone's AI course Artificial intelligence and machine learning learned from scratch with Python --Library used: scikit-learn
Given the rate of increase in stock prices over the past four days, we can guess whether the stock price on that day will rise or fall.
stock_price.txt
10000
9993
10259
...
The closing price of the stock price for one day is stored in one line.
from sklearn import svm
Read sample data and store it in stock_data
#File reading
stock_data = []
stock_data_file = open("stock_price.txt", "r")
for line in stock_data_file:
line = line.rstrip()
stock_data.append(float(line))
stock_data_file.close()
The rate of increase is calculated as follows
Rate of increase on day i=\frac{Stock price on day i- (i-1)Stock price of the day}{(i-1)Stock price of the day}
Store the data calculated above in modified_data
.
count_s = len(stock_data)
modified_data = []
for i in range(1, count_s):
modified_data.append(float(stock_data[i] - stock_data[i-1]) / float(stock_data[i-1]) * 20)
count_m = len(modified_data)
For each day, store the rate of increase for the past 4 days in successive_data
.
Also, whether it went up or down on that day is stored in ʻanswers`.
#Data for 4 days up to the previous day
successive_data = []
#Correct answer Price increase: 1 Price decrease:0
answers = []
for i in range(4, count_m):
successive_data.append([modified_data[i-4], modified_data[i-3], modified_data[i-2], modified_data[i-1]])
if modified_data[i] > 0:
answers.append(1)
else:
answers.append(0)
n = len(successive_data)
m = len(answers)
Train on 75% of the data.
#Linear support vector machine
clf = svm.LinearSVC()
#Training with support vector machine (75 of data)%Used for training)
clf.fit(successive_data[:int(n*75/100)], answers[:int(n*75/100)])
Make predictions on the remaining 25% of the data.
#Test data
#Correct answer
expected = answers[int(-n*25/100):]
#Forecast
predicted = clf.predict(successive_data[int(-n*25/100):])
#Compare the last 10
print(expected[-10:])
print(list(predicted[-10:]))
#Calculation of correct answer rate
correct = 0.0
wrong = 0.0
for i in range(int(n*25/100)):
if expected[i] == predicted[i]:
correct += 1
else:
wrong += 1
print("Correct answer rate:" + str(correct/(correct+wrong) * 100) + "%")
In the teaching material video, the correct answer rate is about 61%, but When I actually try it, a warning appears, and the correct answer rate is about 50% (almost no difference if I guess it) ...
Will there be a difference in the results because the versions of python etc. are different?
C:\Anaconda3\lib\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
"the number of iterations.", ConvergenceWarning)
[0, 0, 0, 1, 0, 0, 0, 1, 1, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 1, 1]
Correct answer rate: 55.62248995983936%
Visualize sample data
Recommended Posts