[PYTHON] Gaussian process with pymc3

I will try to estimate by Gaussian Process using the data on shampoo sales found on this site.

https://machinelearningmastery.com/time-series-datasets-for-machine-learning/

The Gaussian process is a typical nonparametric regression model. It has the advantages of being strong against non-linear data and being able to confirm estimation uncertainty.

import os
import pandas as pd
shampoo_df = pd.read_csv('../data/shampoo.csv')
shampoo_df
shampoo_df.plot.line(x='Month', y='Sales',c="k");

image.png

image.png

In the following, the marginal likelihoods are calculated to obtain the appropriate likelihood and prior distribution pairs.


with pm.Model() as shampoo_model:
    ρ=pm.HalfCauchy('ρ', 1) #The larger the Length scale, the closer the movements appear to be.<= Autocorrelation
    η = pm.HalfCauchy('η', 1) #The larger the Length scale, the closer the movements appear to be.<= Autocorrelation
    
    M = pm.gp.mean.Linear(coeffs=(shampoo_df.index/shampoo_df.Sales).mean()) #initial value
    K= (η**2) * pm.gp.cov.ExpQuad(1, ρ)

    σ = pm.HalfCauchy('σ', 1)
    gp = pm.gp.Marginal(mean_func=M, cov_func=K) 
    gp.marginal_likelihood("recruits", X=shampoo_df.index.values.reshape(-1,1), 
                           y=shampoo_df.Sales.values, noise=σ)
    trace = pm.sample(1000)

pm.traceplot(trace);

image.png

Maximum likelihood estimation creates a good model by maximizing the likelihood function p (X | θ) with respect to the parameter θ, but does not use the prior distribution p (θ). On the other hand, it seems that the case where there is a prior distribution is called maximum posteriori estimation.

Please refer to the following books for details. Atsushi Suyama. Machine Learning Startup Series Introduction to Machine Learning by Bayesian Inference (Japanese Edition) (Kindle Locations 2154-2156). Kindle Edition.

Obtain the optimum parameters by maximal posteriori estimation.


with shampoo_model:
    marginal_post = pm.find_MAP()

{'ρ_log__': array(2.88298779), 'η_log__': array(5.63830367), 'σ_log__': array(4.13876033), 'ρ': array(17.86757799), 'η': array(280.98567051), 'σ': array(62.72501496)}


X_pred=np.array([r for r in range(X_pred.max()+10)]).reshape(-1,1)

with shampoo_model:
    shampoo_pred = gp.conditional("shampoo_", X_pred)
    samples=pm.sample_ppc([marginal_post], vars=[shampoo_pred] ,samples=100)



fig = plt.figure(figsize=(10,5)); 
ax = fig.gca()

from pymc3.gp.util import plot_gp_dist
plot_gp_dist(ax, samples["shampoo_"], X_pred);

ax.plot(shampoo_df.index, shampoo_df.Sales, 'dodgerblue', ms=5, alpha=0.5);
ax.plot(shampoo_df.index, shampoo_df.Sales, 'ok', ms=5, alpha=0.5);
plt.show()

In pm.sample_ppc, sampling is performed from the posterior prediction distribution based on the parameters in marginal_post.

image.png

Recommended Posts

Gaussian process with pymc3
Gaussian process regression with PyMC3 Personal notes
Regression using Gaussian process
Gaussian process regression using GPy
Kill the process with sudo kill -9
ShinobiLayer: Process monitoring with Advanced Monitoring
Monitor Tomcat process with Zabbix-agent
Process feedly xml with Python.
Implement Gaussian process in Pyro
"Gaussian process and machine learning" Gaussian process regression implemented only with Python numpy
Process Pubmed .xml data with python
Process Pubmed .xml data with python [Part 2]
PRML Chapter 6 Gaussian Process Regression Python Implementation
Automat WEB process with Heroku + Selenium + Chrome
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1
Process multiple lists with for in Python
Using gensim with R (Hierarchical Dirichlet Process)
Add Gaussian noise to images with python2.7
Gaussian process regression Numpy implementation and GPy
Image processing with Python 100 knocks # 9 Gaussian filter