[PYTHON] Can Machine Learning Predict Parallelograms? (2) What happens when extrapolation is done even though it looks like interpolation? ??

Previous article Can machine learning predict parallelograms? (1) Can it be extrapolated?

I thought I wrote the previous article, but I didn't write much about programming techniques. However, since it is calculated by pyhton, the tag is set to python. Many people are doing machine learning with python, so posting to qiita is fine, isn't it? .. .. This calculation is done with scikit-learn.

Well, in the previous article I found that extrapolation was not possible. So what about the following problems?

Question 3: Is it possible to learn with small and large values and make predictions in the meantime?

	ａ	ｂ	Angle c
For learning	0~50　and　1000~1100	0~100	0～90
interpolation? Extrapolation?	150～900	50	45

It's a simple matter. Is it possible to predict the length a of the base from 0 to 50 and 1000 to 1100, and to predict 150 to 900 in the meantime?

Lasso for linear regression is probably predictable. But what about random forests and neural networks? With this dataset, I'm wondering if I can predict from 0 to 1100. Actual data seems to have such an example.

The result. First, the coefficient of determination and graph of the learning result.

Coefficient of determination	Learning	test
Lasso regression	0.686	0.661
Random forest	0.999	0.975
neural network	0.997	0.997

Lasso Random forest neural network The value of the base a is set to a small value and a large value, but since b and the angle c are randomly created, the area area of the area appears to be connected.

Looking at the graph, I don't know why the two outliers of Random Forest came out, but looking at these graphs and the coefficient of determination, I feel that I don't need to choose Lasso, and Random Forest. The best! The best neural network! is what I think.

Then, what happens when we predict the value during learning at the base a? is. Let's take a look at the graph for the time being. Lasso Random forest neural network

I put the numerical value of the base a during learning, but the neural network is completely useless. Looking at this, regression prediction in deep learning is dangerous unless it is predicted by interpolation firmly. So is Random Forest. It is necessary to firmly identify the interpolation and extrapolation.

As you can see, Lasso can predict with close values. If there is any risk of extrapolation, linear regression may be a good choice.

Conclusion

Stop predicting extrapolation with machine learning. Particular attention should be paid to extrapolation, which looks like interpolation. If you think you're making extrapolation predictions in a few moments, assume it's linear and do your best with linear regression.

It may be that.

Next, let's consider the descriptor design.