This article is the 13th day article of Puri Puri Appliance Advent Calendar 2020.
I usually use Python to work on MLOps. Originally, I was developing with build systems such as Maven and Gradle such as Java and Kotlin. I think Python, which I'm using these days, is a very powerful tool in the context of one-shot scripts and machine learning. However, when you actually start operating at the product level, you may be exhausted by the dependencies, the difficulty of standardization, and the weaknesses around the build. So, if I reproduced the multi-project configuration used in Java, Kotlin, etc. in Python, it became very easy to develop. This time, I will write about that knowledge.
As mentioned at the beginning, I usually work on products related to AI with a focus on MLOps. The following issues arose during the development process.
--The same preprocessing must be done during learning and inference → Code copy and paste occurs ――I want to make it a mono-repo, but I can't because there are many problems with pip dependency resolution → More new modules and repositories
As a result of organizing these issues, the following requirements came out.
-** Each module wants to define an independent dependency ** -** I want to reuse common processing ** -** I want to be able to see the list of dependents at a glance **
To solve these problems, we proposed to set up a private pip server and put the Pickle Object in Storage, but we rejected them from the viewpoint of maintenance cost. Then, I arrived at the idea of reproducing a multi-project configuration using Maven, Gradle, etc. with Python & Poetry
.
I will not explain the outline and advantages of the actual multi-project by Maven, Gradle, etc. because there are many cohesive things in the world. This time, I will raise the benefits of ML Ops and Python.
-** Preprocessing can be standardized ** -** Dependencies can be managed individually for each model **
We also use Poetry to realize multi-projects. Then, you can enjoy the following additional benefits.
-** You don't have to prepare setup.py or requirements.txt. ** ** -** The virtual environment and dependencies can be resolved with a single command. ** ** -** Dependency resolution algorithms that are more sophisticated than pipenv are available. ** **
The directory structure is very simple.
.
├── README.md
├── libs
│ ├── lib-one
│ └── logger
└── projects
├── __init__.py
├── project-one
└── project-two
Each has the following roles.
./lib-> Common module that you want to reuse in multiple projects ./projects-> Main module to execute use cases
The modules under projects depend on the modules under libs.
When adding a module, it is the same whether it is libs or projects. It will be generated by the new command of poetry.
#When adding a common module
cd libs/
poetry new lib-two
#When adding the main module
cd projects/
poetry new project-three
That's OK. Poetry will generate a template for you. After that, let's implement it as you like!
Now let's see how to actually add a common module dependency to the main module. That said, it's very easy.
cd projects/project-one
poetry add ../../libs/lib-one
You can do it with just poetry add {path of the common module you want to put}
.
If you look at pyproject.toml, you can see that the dependencies have been added properly.
[tool.poetry]
name = "project-one"
version = "0.1.0"
description = ""
authors = ["ya-mori <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.7"
lib-one = {path = "../../libs/lib-one"}
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
The official documentation also describes how to add this dependency. (https://python-poetry.org/docs/cli/#add)
The great thing about this method is that the changes in the common module are reflected immediately. In other words, you don't have to rebuild each time.
You have now added your favorite shared module dependencies to your favorite main module: smile:
This time I've summarized how to implement a multi-project in Python. I had another hard time because I had to convert these to Dockerfile in actual work, but I would like to introduce it at another time. I hope it helps you even a little.
Recommended Posts