[PYTHON] I tried to predict next year with AI

Introduction 2019 is just a few hours away, and interest in next year is growing. I made a prediction by AI about what year next year will be.

Method

Training data

I learned from 2019 data from this year to 2019.

image.png

Learning

I learned it with Kernel Ridge of rbf kernel.


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.read_csv('years.csv',names=("years", "result"))
features = df.drop(["result"], axis=1)
target = df["result"]

from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(features, target, test_size=0.2, random_state=0)

from sklearn.model_selection import GridSearchCV
from sklearn.kernel_ridge import KernelRidge
param_grid = {'alpha': [i*10**j for i in [1,3] for j in [-9,-8,-7]],
              'gamma': [i*10**j for i in [1,2,4,7] for j in [-6,-5,-4]]}
gs = GridSearchCV(KernelRidge(kernel='rbf'), param_grid, cv=5, n_jobs=3)
gs.fit(train_x, train_y)
rgr = gs.best_estimator_

The training data was randomly divided into a training data and a test data, and trained with the training data. Kernel Ridge has hyperparameters ʻalpha and gamma`, so I optimized it by grid search.

Result

Cross-validation

GridSearchCV further divides the given data and searches for the parameters that maximize the generalization performance. Determine the predictor performance with optimal parameters.

print(gs.best_estimator_)
print(gs.best_score_)
KernelRidge(alpha=1e-09, coef0=1, degree=3, gamma=2e-05, kernel='rbf',
      kernel_params=None)
0.9999999999996596

The generalization performance score was sufficiently high.

yyplot

plt.scatter(rgr.predict(train_x), train_y, marker='.', label='train')
plt.scatter(rgr.predict(test_x), test_y, marker='.', label='test')
plt.legend()
plt.show()

A yyplot was drawn to visualize whether or not a valid prediction was made for the training / test data.

image.png

It can be seen that correct predictions have been made for many existing data.

Learning curve

A learning curve was drawn and verified to determine whether it was overfitting.

from sklearn.model_selection import (learning_curve,ShuffleSplit)

def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
                        n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5), verbose=0):
    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training examples")
    plt.ylabel("Score")
    train_sizes, train_scores, test_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes, verbose=verbose)
    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
    test_scores_std = np.std(test_scores, axis=1)
    plt.grid()

    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
                     train_scores_mean + train_scores_std, alpha=0.1,
                     color="r")
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
                     test_scores_mean + test_scores_std, alpha=0.1, color="g")
    plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
             label="Training score")
    plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
             label="Cross-validation score")

    plt.legend(loc="best")
    return plt

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
t_size = np.linspace(0.01, 1.00, 20)
plot_learning_curve(RandomForestRegressor(n_estimators=50),
                    "Learning Curve", features, target, cv=cv, ylim=[0.98,1.005], train_sizes=t_size, verbose=10)
plt.show()

image.png

Since both learning performance and generalization performance have converged to high values, it can be judged that the possibility of overfitting is low.

Forecast for next year


print(rgr.predict([[2019+1]]))

I entered the parameters for the next year of this year and predicted the next year.

[2019.99488853]

The result was $ 2.020 \ times 10 ^ 3 $ years. As a result, next year is expected to be 2020.

Discussion

The Kernel Ridge method of the rbf kernel is a method for finding a function that minimizes the loss function from the infinite dimensional Gaussian function space, and exhibits high generalization performance for problems where an explicit function form is assumed. It is highly probable that next year will be 2020.

Recommended Posts

I tried to predict next year with AI
I tried to predict Titanic survival with PyCaret
I tried to predict and submit Titanic survivors with Kaggle
I tried to predict Boston real estate prices with PyCaret
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried to implement CVAE with PyTorch
I tried to solve TSP with QAOA
I tried to predict Covid-19 using Darts
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to debug.
I tried to make AI for Smash Bros.
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried to paste
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to predict the price of ETF
I tried to predict the horses that will be in the top 3 with LightGBM
I tried to learn the sin function with chainer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to create a table only with Django
I tried to extract features with SIFT of OpenCV
I tried to move Faster R-CNN quickly with pytorch
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to predict the J-League match (data analysis)
I tried to solve the soma cube with python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to build ML Pipeline with Cloud Composer
I tried to implement time series prediction with GBDT
I tried to uncover our darkness with Chatwork API
[TensorFlow] I tried to introduce AI to rolling stock iron
I tried to automatically generate a password with Python3
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to implement Grad-CAM with keras and tensorflow
I tried to make an OCR application with PySimpleGUI
I tried to implement SSD with PyTorch now (Dataset)
I tried to interpolate Mask R-CNN with Optical Flow