[PYTHON] People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning

Last time talked about forecasting stock prices using a decision tree algorithm as an example of future forecasting by machine learning. In such a situation where the next numerical value is simply predicted from the sequence of sequences that represent the latest portfolio changes, the cost is low and reasonable with a simple and clear method like a decision tree without relying on complicated algorithms. It is possible to realize prediction by accuracy.

Mechanical forecasts can be useful, for example, in short-term trading. Margin trading may be better than physical trading, as shorter ranges such as days and days are better than weeks and hours and minutes are better. If you are investing in the medium to long term, I think it is important to have a basic stance of investing in stocks with excellent fundamentals and low PER and good ROE.

[List of technical indicators](http://en.wikipedia.org/wiki/%E3%83%86%E3%82%AF%E3%83%8B%E3%82%AB%E3%83%AB% As you can see from E6% 8C% 87% E6% A8% 99% E4% B8% 80% E8% A6% A7), these ancient formulas are by no means complicated. Also, the criteria for judging the signal is not a very difficult algorithm. Taking this into consideration, I think that by applying today's machine learning, it may be possible to devise algorithms with a higher accuracy rate or to use them for system trading. For example, it may be possible that a new generation of investors armed with machine learning algorithms will enter the market one after another within five years.

Save the trained state classifier as serialized binary data

By the way, in general, in supervised machine learning, classifiers are learned by supervised data, but how are these stored? In the case of humans, the learned memory is stored in brain cells, but it is also necessary to store the learned knowledge in machines as well.

Fitting the entire teacher data every time is expensive, so if possible, I would like to use the trained instance from the next time onwards. Therefore, use the pickle module.

The Python pickle module serializes the object. This is the equivalent of the Marshal module, which behaves similarly in Ruby. By using pickle, you can save the trained instance as serialized data.

The diagram of supervised learning is as follows.

10.png

As explained in the article Last time, the machine learning library scikit-learn creates an instance based on the machine learning class and uses the .fit method. Fits (= learning) to the teacher data.

#Create an instance of machine learning(In the example below, a decision tree classifier)
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf.fit(features, labels) #Learn with teacher data

The pickle module can handle any Python object and can convert (= serialize) it as a conversion to byte data. ..

The cPickle module is a C implementation of pickle. Unlike pickle, cPickle cannot be subclassed. It's much faster than pickle, so basically we recommend using this. As a technique, if a C implementation is available, you can use it, and if it fails, you can import a regular pickle.

try:
    import cPickle as pickle
except:
    import pickle

In the next and subsequent classifications, if there is a saved object (= has a memory), it is sufficient to call up the stored knowledge and perform the classification.

#Write the instance to a file
with open(filename, 'wb') as f:
    pickle.dump(clf, f)

For example, if you have data that is updated daily and you want to use it as teacher data, you only need to fit the teacher data for that day to the loaded instance.

Also, it is a good idea to create a new instance and relearn the entire teacher data only when the knowledge is not stored.

#Load the instance only if the file exists
if os.path.exists(filename):
    with open(filename, 'rb') as g:
        clf = pickle.load(g)
else:
    #If there is no file, create a new instance and relearn

It should be noted that if you try to use an instance serialized binary on another host, it will not work properly on different architectures. Care must be taken when building an analysis platform with multiple computers. Also, when changing the version of the underlying scikit-learn library, it is better to discard the knowledge accumulated so far and start learning from scratch.

Improve generalization ability by preparing a separate classifier for each data trend

Furthermore, by applying the clustering method described earlier, multiple instances of the classifier are generated, and the instance that is suitable for it according to the data tendency. You can also use the technique of letting you predict.

Let's figure this out as well.

11.png

With unsupervised learning, K-means clustering, for example, financial data is organized with a certain degree of similarity regardless of industry. By creating an instance of the classifier for each cluster and fitting it in this way, a classifier with higher generalization ability is completed.

When clustered by K-means and trained per cluster, the number of machine findings = k.

For example, an application method such as clustering stocks with similar price movements across industry boundaries and predicting the next price movement from these experiences can be considered.

You can still use the pickle module to load / unload multiple instances in such cases.

Summary

This time, I explained the story of serializing and saving the learned state of teacher data in machine learning with pickle (= equivalent to human beings memorizing knowledge in the hippocampus of the brain). This is just an example, but I think it's an easy technique to use for scikit-learn.

Recommended Posts

People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning
How to adapt multiple machine learning libraries in one shot
How to increase the number of machine learning dataset images
How to collect machine learning data
How to use machine learning for work? 01_ Understand the purpose of machine learning
Introduction to Machine Learning: How Models Work
scikit-learn How to use summary (machine learning)
How to enjoy Coursera / Machine Learning (Week 10)
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)
[For beginners] Introduction to vectorization in machine learning
I want to visualize where and how many people are in the factory
9 Steps to Become a Machine Learning Expert in the Shortest Time [Completely Free]
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to predict the change in snowfall for 2 years by machine learning
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
How to retrieve the nth largest value in Python
How to get the variable name itself in python
How to run the Ansible module added in Ansible Tower
How to get the number of digits in Python
How to know the current directory in Python in Blender
About testing in the implementation of machine learning models
How to perform learning in SageMaker without session timeout
[Python] How to output the list values in order
Introduction to machine learning
I tried to compress the image using machine learning
How to build Anaconda virtual environment used in Azure Machine Learning and link with Jupyter
Paper: Machine learning paper that reproduces images in the brain, (Deep image reconstruction from human brain activity)
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
How to write custom validations in the Django REST Framework
How to find the optimal number of clusters in k-means
Uncle SE with hardened brain tried to study machine learning
[python] How to check if the Key exists in the dictionary
Try to evaluate the performance of machine learning / regression model
[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
The result of Java engineers learning machine learning in Python www
Survey on the use of machine learning in real services
How to use the __call__ method in a Python class
Try to evaluate the performance of machine learning / classification model
How to set the html class attribute in Django's forms.py
How to manipulate the DOM in an iframe with Selenium
How to calculate the amount of calculation learned from ABC134-D
How to log in automatically like 1Password from the CLI
How to use machine learning for work? 03_Python coding procedure
How to generate a query using the IN operator in Django
How to get the last (last) value in a list in Python
How to get all the keys and values in the dictionary
[Machine learning] I tried to summarize the theory of Adaboost
In Django, how to abbreviate the long displayed string as ....
Notes on how to use marshmallow in the schema library
[Shell] How to get the remote default branch in Git
Machine learning learned with Pokemon
Machine learning in Delemas (practice)
How to use the generator
Used in machine learning EDA
How to develop in Python
Super introduction to machine learning
How to use the decorator
How to increase the axis
How to start the program
How to set the output resolution for each keyframe in Blender
How to intentionally issue an error in the shell During testing