[PYTHON] Predictive statistics (theory)

Please feel free to comment as it may be wrong as an output of study. This time I will write about predicting.

Contents

・ What is prediction? ・ Variable type ・ Prediction method ・ About data

What to predict

It is to predict whether the graph after a certain time will rise or fall from the information of the graph before a certain time, or to predict how to classify it given a certain label.

variable

There are two main variables used in forecasting. ・ Objective variable ・ Explanatory variables

Objective variable

The objective variable is the target of the data to be handled. For example, when looking at product sales, the number of products itself is the objective variable.

Explanatory variable

An explanatory variable is an event that acts on the objective variable. For example, apple sales can go down or up due to weather, consumer demand, and modern social issues. Explanatory variables are those events.

Predicting method

There are two methods for making predictions. ・ Regression ・ Classification

Regression

This makes graph predictions for numerical values. For example, the demand forecast for the products mentioned in the previous example.

Classification

This is a method of dividing by category. For example, we predict that we will divide lions and cheetahs into the category of cats.

About data

Data is commonly used on a daily basis, but it is well-structured in the world of data science. These are the following two. ・ Training ・ Evaluation (test)

Learning

Creating a model based on the data

Evaluation

Comparing the learned model with the original evaluation data

* Notes </ b> One thing to keep in mind when building a model is to avoid overfitting </ b>. Overfitting means that you can't predict flexibly because you put too much specific data. For example, if you enter data that everyone loves apples in a cram school with 100 students, the machine predicts that other people will also love apples with a 100% probability. However, in another cram school, there are only half or less of the students who love apples. At this point, the forecast is not 100%. This is overfitting. Therefore, the amount of data to be learned to avoid this needs to be adjusted.

Recommended Posts