Attempt to include machine learning model in python package

Overview

Someone in Mercari has published some wonderful knowledge about machine learning from development to operation. https://mercari.github.io/ml-system-design-pattern/README_ja.html Let's expand a little about model management.

First, let me give you the big picture. https://github.com/arc279/model-in-package-sample I mean, this is all.

The following is an explanation of the main points. I will not explain setuptools etc. so please google each one if necessary.

The sample execution environment is

(.venv) $ python -V
Python 3.8.1

I am sending it at.

Non-source data can be included in python packages

A while ago, in the setuptools area, things like package_data and data_files were complicated, but Recently MANIFEST.in and importlib.resources It seems that it has converged to .python.org/ja/3/library/importlib.html#module-importlib.resources).

Note that ʻimportlib.resources has been added since python 3.7, and older versions require you to use something like pkg_resources. To be honest, it is not easy to use, so if possible, use ʻimportlib.resources in 3.7 or later.

Please see this area for how to use it.

https://github.com/arc279/model-in-package-sample/blob/master/MANIFEST.in https://github.com/arc279/model-in-package-sample/blob/master/src/mymodel/init.py#L5

When hardened on a wheel, it looks like this

It contains a * .pkl file.

$ python setup.py bdist_wheel

(..snip..)

$ zipinfo -1 dist/mymodel-1.1.1_titanic.from_kaggle-py3-none-any.whl
mymodel/__init__.py
mymodel/version.py
mymodel/titanic_sample/__init__.py
mymodel/titanic_sample/models/__init__.py
mymodel/titanic_sample/models/LogisticRegression/__init__.py
mymodel/titanic_sample/models/LogisticRegression/model.pkl
mymodel/titanic_sample/models/RandomForestClassifier/__init__.py
mymodel/titanic_sample/models/RandomForestClassifier/model.pkl
mymodel/titanic_sample/models/SVC/__init__.py
mymodel/titanic_sample/models/SVC/model.pkl
mymodel/titanic_sample/models/SVC/__pycache__/__init__.cpython-38.pyc
mymodel/titanic_sample/models/__pycache__/__init__.cpython-38.pyc
mymodel-1.1.1_titanic.from_kaggle.dist-info/METADATA
mymodel-1.1.1_titanic.from_kaggle.dist-info/WHEEL
mymodel-1.1.1_titanic.from_kaggle.dist-info/top_level.txt
mymodel-1.1.1_titanic.from_kaggle.dist-info/RECORD

User side

Once you have it in the wheel, you can pip it in.

(.venv) $ pip install dist/mymodel-1.1.1_titanic.from_kaggle-py3-none-any.whl

(..snip..)

(.venv) $ pip list
Package         Version
--------------- -------------------------
joblib          0.15.1
mymodel         1.1.1-titanic.from-kaggle
numpy           1.18.5
pandas          1.0.4
pip             19.2.3
python-dateutil 2.8.1
pytz            2020.1
scikit-learn    0.23.1
scipy           1.4.1
setuptools      41.2.0
six             1.15.0
threadpoolctl   2.1.0
wheel           0.34.2

call

(.venv) $ ipython
In [1]: import mymodel

In [2]: mymodel.__version__
Out[2]: '1.1.1-titanic.from-kaggle'

Read the data in the package

It is a continuation of ipython.

In [3]: import importlib.resources

In [4]: import pickle

In [5]: import mymodel.titanic_sample.models.LogisticRegression

In [6]: b = importlib.resources.read_binary(mymodel.titanic_sample.models.LogisticRegression, "model.pkl")

In [9]: len(b)
Out[9]: 739

In [10]: c = pickle.loads(b)

In [11]: c.__class__
Out[11]: sklearn.linear_model._logistic.LogisticRegression

You can do it. See this area for details.

By the way

If you suppress the above points, you can get the implication that ** a python package containing only data ** is also possible. I think it depends on the project how much you should include, so you can consider various things.

Finally about versioning

The version convention of the python package is rather sloppy, and the semantic versioning ja /) can be adopted. So you can use this example of Mercari as it is. https://mercari.github.io/ml-system-design-pattern/Operation-patterns/Data-model-versioning-pattern/design_ja.html

Like this. https://github.com/arc279/model-in-package-sample/blob/master/setup.cfg#L3 https://github.com/arc279/model-in-package-sample/blob/master/src/mymodel/version.py

I'm talking about that. See Sample github for the big picture.

Recommended Posts

Attempt to include machine learning model in python package
Python: Preprocessing in Machine Learning: Overview
I installed Python 3.5.1 to study machine learning
Python: Preprocessing in machine learning: Data acquisition
An introduction to Python for machine learning
[Python] Saving learning results (models) in machine learning
Python: Preprocessing in machine learning: Data conversion
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)
[Python] Easy introduction to machine learning with python (SVM)
Attempt to detect English spelling mistakes in python
I tried to implement TOPIC MODEL in Python
Get a glimpse of machine learning in Python
[For beginners] Introduction to vectorization in machine learning
Introduction to machine learning
I tried to organize the evaluation indexes used in machine learning (regression model)
Machine learning beginners tried to make a horse racing prediction model with python
Build an interactive environment for machine learning in Python
Tool MALSS (application) that supports machine learning in Python
Machine learning python code summary (updated from time to time)
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Tool MALSS (basic) that supports machine learning in Python
Preparing to start "Python machine learning programming" (for macOS)
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Cross-entropy to review in Coursera Machine Learning week 2 assignments
How to use the model learned in Lobe in Python
MALSS, a tool that supports machine learning in Python
Machine learning model considering maintainability
Machine learning in Delemas (practice)
Python package management in IntelliJ
Login to website in Python
Machine learning with Python! Preparation
Python Machine Learning Programming> Keywords
Speech to speech in python [text to speech]
Used in machine learning EDA
Beginning with Python machine learning
How to develop in Python
Super introduction to machine learning
Post to Slack in Python
How to adapt multiple machine learning libraries in one shot
Try to evaluate the performance of machine learning / regression model
The result of Java engineers learning machine learning in Python www
Try to evaluate the performance of machine learning / classification model
How to use machine learning for work? 03_Python coding procedure
Create a python machine learning model relearning mechanism with mlflow
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Implement stacking learning in Python [Kaggle]
[Python] How to do PCA in Python
Introduction to machine learning Note writing
Convert markdown to PDF in Python
How about Anaconda for building a machine learning environment in Python?
How to collect images in Python
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Coursera Machine Learning Challenges in Python: ex5 (Adjustment of Regularization Parameters)
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Machine learning with python (1) Overall classification
How to use SQLite in Python
Machine learning summary by Python beginners
Automate routine tasks in machine learning
In the python command python points to python3.8
Widrow-Hoff learning rules implemented in Python