Overview
Someone in Mercari has published some wonderful knowledge about machine learning from development to operation. https://mercari.github.io/ml-system-design-pattern/README_ja.html Let's expand a little about model management.
First, let me give you the big picture. https://github.com/arc279/model-in-package-sample I mean, this is all.
The following is an explanation of the main points. I will not explain setuptools etc. so please google each one if necessary.
The sample execution environment is
(.venv) $ python -V
Python 3.8.1
I am sending it at.
A while ago, in the setuptools
area, things like package_data
and data_files
were complicated, but
Recently MANIFEST.in and importlib.resources It seems that it has converged to .python.org/ja/3/library/importlib.html#module-importlib.resources).
Note that ʻimportlib.resources has been added since python 3.7, and older versions require you to use something like
pkg_resources. To be honest, it is not easy to use, so if possible, use ʻimportlib.resources
in 3.7 or later.
Please see this area for how to use it.
https://github.com/arc279/model-in-package-sample/blob/master/MANIFEST.in https://github.com/arc279/model-in-package-sample/blob/master/src/mymodel/init.py#L5
It contains a * .pkl
file.
$ python setup.py bdist_wheel
(..snip..)
$ zipinfo -1 dist/mymodel-1.1.1_titanic.from_kaggle-py3-none-any.whl
mymodel/__init__.py
mymodel/version.py
mymodel/titanic_sample/__init__.py
mymodel/titanic_sample/models/__init__.py
mymodel/titanic_sample/models/LogisticRegression/__init__.py
mymodel/titanic_sample/models/LogisticRegression/model.pkl
mymodel/titanic_sample/models/RandomForestClassifier/__init__.py
mymodel/titanic_sample/models/RandomForestClassifier/model.pkl
mymodel/titanic_sample/models/SVC/__init__.py
mymodel/titanic_sample/models/SVC/model.pkl
mymodel/titanic_sample/models/SVC/__pycache__/__init__.cpython-38.pyc
mymodel/titanic_sample/models/__pycache__/__init__.cpython-38.pyc
mymodel-1.1.1_titanic.from_kaggle.dist-info/METADATA
mymodel-1.1.1_titanic.from_kaggle.dist-info/WHEEL
mymodel-1.1.1_titanic.from_kaggle.dist-info/top_level.txt
mymodel-1.1.1_titanic.from_kaggle.dist-info/RECORD
Once you have it in the wheel, you can pip it in.
(.venv) $ pip install dist/mymodel-1.1.1_titanic.from_kaggle-py3-none-any.whl
(..snip..)
(.venv) $ pip list
Package Version
--------------- -------------------------
joblib 0.15.1
mymodel 1.1.1-titanic.from-kaggle
numpy 1.18.5
pandas 1.0.4
pip 19.2.3
python-dateutil 2.8.1
pytz 2020.1
scikit-learn 0.23.1
scipy 1.4.1
setuptools 41.2.0
six 1.15.0
threadpoolctl 2.1.0
wheel 0.34.2
(.venv) $ ipython
In [1]: import mymodel
In [2]: mymodel.__version__
Out[2]: '1.1.1-titanic.from-kaggle'
It is a continuation of ipython.
In [3]: import importlib.resources
In [4]: import pickle
In [5]: import mymodel.titanic_sample.models.LogisticRegression
In [6]: b = importlib.resources.read_binary(mymodel.titanic_sample.models.LogisticRegression, "model.pkl")
In [9]: len(b)
Out[9]: 739
In [10]: c = pickle.loads(b)
In [11]: c.__class__
Out[11]: sklearn.linear_model._logistic.LogisticRegression
You can do it. See this area for details.
If you suppress the above points, you can get the implication that ** a python package containing only data ** is also possible. I think it depends on the project how much you should include, so you can consider various things.
The version convention of the python package is rather sloppy, and the semantic versioning ja /) can be adopted. So you can use this example of Mercari as it is. https://mercari.github.io/ml-system-design-pattern/Operation-patterns/Data-model-versioning-pattern/design_ja.html
Like this. https://github.com/arc279/model-in-package-sample/blob/master/setup.cfg#L3 https://github.com/arc279/model-in-package-sample/blob/master/src/mymodel/version.py
I'm talking about that. See Sample github for the big picture.
Recommended Posts