[PYTHON] I want to do machine learning even without a server --Time Series Edition -

It will be updated at any time as it is underway.

*update

In January 2017, the prototype implementation was completed. I talked about this implementation at AWS Premier Night # 3, so please refer to [here](http: http: Please see the article at //blog.serverworks.co.jp/tech/2017/01/13/awspremier3-starting-serverless-ml/).

Basically, it's an article that I just want to do some recent delicious food. An attempt to get easy operation and machine learning at the same time by putting ML logic on as a Service.

In this article, I will describe here the examination to put the logic of "time series analysis" on the serverless infrastructure of FaaS system or managed infrastructure such as PaaS if possible. Please note that the details of the analysis method will not be discussed.

goal

For the time being, the goal is if the place where the analysis processing implementation moves is (probably) easier to operate than the conventional physical / virtual server. The basic line is around FaaS and PaaS.

Introduction

The analysis I am trying to do is "time series analysis". In the context of infrastructure monitoring, outlier detection / change point detection is applicable. Recently, you have Datadog implemented as a beta version feature. I think it's very nice. => http://docs.datadoghq.com/ja/guides/anomalies/

Outlier detection / change point detection is based on past data to determine whether ** current data is ** a value that should be noted in view of past trends. It is a method for determining the current value, and it is not possible to predict the future. On the other hand, the method I use this time is "future value prediction logic" using methods such as _AR (Auto Regression) _ and _MA (Moving Average) _.

The forecast target is "monthly sales amount". Import the aggregated monthly sales performance from CRM (Salesforce in my case) and use the time series analysis framework to forecast the most recent sales for N months.

(Reference) Inspired the Next

ML as a Service workshop. A spin-off event of ServerlessConf. => [Special Project] Serverless Machine Learning Workshop => SlideShare

Premise

The language used is basically Python. The library around machine learning is extensive, the environment that supports Python is large, and I like it.

In this implementation, the following libraries are used.

Target of consideration

I have listed the candidates that I can think of. There are many things that I haven't considered and have never used. I would like you to point out any influential information.

First of all

From the neighborhood of "machine learning x cloud". After all these three are the major points.

Amazon ML does not support time series analysis, so it is out. I was expecting a release on re: Invent, but Direction of Another You did evolution in / rekognition /). It was a very hot announcement, but it seems that the direction is different from what I expected this time.

Azure time series analysis only supports anomaly detection, so it is not supported as it is. However, you can embed Python / R code, and you can use it to implement anything. The execution environment supports Anaconda, and there is no problem with library support.

GCP machine learning hasn't been investigated yet. Since there is a description that it is based on TensorFlow, I think that the answer is probably "I can implement anything", but I am still lacking in understanding including peripheral services, so it is in an unexamined state at this time.

Runner-up

A service called Function as a Service. There are also three majors below.

Azure / GCP is still under-researched. For Lambda, it works by bundling Python packages (and shared libraries) that are not included in the Lambda runtime environment. As a method, I refer to this (nice) hack. The difference is that statsmodels is required, but it fits within the size limit (50MB) of Labmda's bundle package. => Using Scikit-Learn in AWS Lambda -- Serverless Code

I wonder if other FaaS basically have the same binding. As for Azure Functions, the execution platform is based on App Service, so you may be able to solve the problem by using App Service Editor. Official document "Develop Node.js apps with App Service Editor ", it seems that you can also add packages by npm, so even in Python The rationale is that it would be the same reasoning. It's still unverified, but if ↑ is true, you're likely to be a big fan of Azure.

Further runner-up

PaaS and managed container environment. Since "I want to do serverless" is one of the starting points, I am basically reluctant to consider hiring.

Anaconda has all the libraries I need. A PaaS that supports Anaconda in the execution environment would be an attractive option.

Heroku would like to have Anaconda in the execution environment, but it doesn't seem to be done. After investigating the Buildpack, it seems that there is a Conda Build Pack. It's a very nice line, but it seems that statsmodels is not included. I have no choice but to build the environment by myself, but if I have a hard time, I will consider the direction of doing my best with FaaS.

Cloud Docker is virtually the same as ECS if AWS is used as the back, so it is excluded once. However, multi-platform support is interesting like Terraform. It is a service that I would like to touch once on another occasion.

GAE can use a user-defined Dockerfile-based runtime if it is a beta release Flexible Environment. There seems to be a possibility.

Implementation

Adopt AWS Lambda

Please refer to the Company Blog for materials. The story about implementation is written from around slide p.30.

Impressions after mounting

The implementation is too rough, so the configuration and functions need to be refactored more and more. I want to separate the pre-processing and post-processing from the Function that drives the prediction logic and make it a string-like configuration. Since the data source is on Salesforce, I would like to work on the cooperation there.

This and that about Lambda

It's a pity that the only way to add a package is to "push into the zip archive yourself". While saying that it is serverless, it's kind of overwhelming to have such a hard time around the construction. .. ..

Well, nowadays, deployment tools targeting FaaS are coming out, and I think there are many areas that can be covered by the power of the tools. It's a hassle, but the current perception is that it's a serious concern.

Regarding the deployment tool, Lamvery seems to be useful at the moment. It seems that it has a function to collect and bundle not only Python packages but also shared libraries, and the build work seems to be progressing. It is also convenient that the management unit is the Function unit. I didn't use any deployment tools in particular, so I will use Lamvery for future development.

Others that were not considered (or rather Azure)

I chose AWS Lambda because of my company background, but to be honest, I prefer to use Azure in the current situation.

The Azure Machine Learning _ "Execute Python Script" _ execution environment covered all the packages I needed. Azure Machine Learning also considers integration with data stores and applications. If you want to play with the logic (of the analysis part) and also want to link with external services, I think Azure Machine Learning is the best choice.

As a reminder, you need to make sure that the required packages are included in the Execute Python Script execution environment (Anaconda). It seems that it can be managed even if it is not included, but in order to clear it, it is necessary to take the trouble close to Lambda above, so the advantage of being able to use it easily is lost. (Official Reference)

Summary

I analyzed the "monthly sales information" uploaded to S3 with the AWS Lambda execution platform and tried to predict the latest sales.

In order to run the prediction logic, it was necessary to use packages that are not included in the Lambda function execution environment, but by bundling them in the deployment package + adding .so loading process to the startup process at runtime, this Clearing the problem.

Since the implementation is still rough, there are many points for improvement. As of January 14, 2017, the things left unfinished and future prospects are as follows.

--Labor saving for build and deploy work --Use of deployment tools. Consider using Lamvery

Recommended Posts

I want to do machine learning even without a server --Time Series Edition -
I want to create a machine learning service without programming! WebAPI
I want to create a machine learning service without programming! Text classification
I want to climb a mountain with reinforcement learning
I made a package to filter time series with python
I want to create a Dockerfile for the time being.
I want to do ○○ with Pandas
I want to record the execution time and keep a log.
MacBookPro Setup After all I want to do a clean installation
[Machine learning] I tried to do something like passing an image
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
I changed my job to a machine learning engineer at AtCoder Jobs
I want to print in a comprehension
I installed Python 3.5.1 to study machine learning
I want to be cursed by a pretty girl every time I sudo! !!
I want to build a Python environment
I tried to classify guitar chords in real time using machine learning
Python: I want to measure the processing time of a function neatly
Bringing machine learning to a practical level in one month # 1 (Starting edition)
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
I want to do a monkey patch only partially safely in Python
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
9 Steps to Become a Machine Learning Expert in the Shortest Time [Completely Free]
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
I want to make a music player and file music at the same time
GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time
I want to do Dunnett's test in Python
I want to easily create a Noise Model
I want to INSERT a DataFrame into MSSQL
I want to create a window in Python
I want to mock datetime.datetime.now () even with pytest!
I want to make a game with Python
I don't want to take a coding test
I want to manage systemd by time zone! !!
I want to create a plug-in type implementation
I want to do pyenv + pipenv on Windows
I want to easily find a delicious restaurant
I want to write to a file with Python
I want to upload a Django app to heroku
[Keras] I tried to solve a donut-type region classification problem by machine learning [Study]
I tried to make a real-time sound source separation mock with Python machine learning
[Mac] I want to make a simple HTTP server that runs CGI with Python
I want to set up a mock server for python-flask in seconds using swagger-codegen.
I want to create a lunch database [EP1] Django study for the first time
I want to create a lunch database [EP1-4] Django study for the first time
I tried to move machine learning (ObjectDetection) with TouchDesigner
Even beginners want to say "I fully understand Python"
I want to easily implement a timeout in python
I want to iterate a Python generator many times
I want DQN Puniki to hit a home run
100 image processing knocks !! (021-030) I want to take a break ...
I want to give a group_id to a pandas data frame
I want to transition with a button in flask
Machine learning python code summary (updated from time to time)
I want to write in Python! (2) Let's write a test
Even in JavaScript, I want to see Python `range ()`!
I want to randomly sample a file in Python
I tried to implement time series prediction with GBDT
I want to work with a robot in python.
I want to split a character string with hiragana