[ML Ops] I want to do multi-project with Python

This article is the 13th day article of Puri Puri Appliance Advent Calendar 2020.

Introduction

I usually use Python to work on MLOps. Originally, I was developing with build systems such as Maven and Gradle such as Java and Kotlin. I think Python, which I'm using these days, is a very powerful tool in the context of one-shot scripts and machine learning. However, when you actually start operating at the product level, you may be exhausted by the dependencies, the difficulty of standardization, and the weaknesses around the build. So, if I reproduced the multi-project configuration used in Java, Kotlin, etc. in Python, it became very easy to develop. This time, I will write about that knowledge.

Why I needed a multi-project

As mentioned at the beginning, I usually work on products related to AI with a focus on MLOps. The following issues arose during the development process.

--The same preprocessing must be done during learning and inference → Code copy and paste occurs ――I want to make it a mono-repo, but I can't because there are many problems with pip dependency resolution → More new modules and repositories

As a result of organizing these issues, the following requirements came out.

-** Each module wants to define an independent dependency ** -** I want to reuse common processing ** -** I want to be able to see the list of dependents at a glance **

To solve these problems, we proposed to set up a private pip server and put the Pickle Object in Storage, but we rejected them from the viewpoint of maintenance cost. Then, I arrived at the idea of ​​reproducing a multi-project configuration using Maven, Gradle, etc. with Python & Poetry.

Benefits of multi-project

I will not explain the outline and advantages of the actual multi-project by Maven, Gradle, etc. because there are many cohesive things in the world. This time, I will raise the benefits of ML Ops and Python.

-** Preprocessing can be standardized ** -** Dependencies can be managed individually for each model **

We also use Poetry to realize multi-projects. Then, you can enjoy the following additional benefits.

-** You don't have to prepare setup.py or requirements.txt. ** ** -** The virtual environment and dependencies can be resolved with a single command. ** ** -** Dependency resolution algorithms that are more sophisticated than pipenv are available. ** **

Let's do it

Directory structure

The directory structure is very simple.

.
├── README.md
├── libs
│   ├── lib-one
│   └── logger
└── projects
    ├── __init__.py
    ├── project-one
    └── project-two

Each has the following roles.

./lib-> Common module that you want to reuse in multiple projects ./projects-> Main module to execute use cases

The modules under projects depend on the modules under libs.

Add module

When adding a module, it is the same whether it is libs or projects. It will be generated by the new command of poetry.


#When adding a common module
cd libs/
poetry new lib-two

#When adding the main module
cd projects/
poetry new project-three

That's OK. Poetry will generate a template for you. After that, let's implement it as you like!

Add a dependency to a module

Now let's see how to actually add a common module dependency to the main module. That said, it's very easy.


cd projects/project-one
poetry add ../../libs/lib-one

You can do it with just poetry add {path of the common module you want to put}. If you look at pyproject.toml, you can see that the dependencies have been added properly.


[tool.poetry]
name = "project-one"
version = "0.1.0"
description = ""
authors = ["ya-mori <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.7"
lib-one = {path = "../../libs/lib-one"}

[tool.poetry.dev-dependencies]
pytest = "^5.2"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

The official documentation also describes how to add this dependency. (https://python-poetry.org/docs/cli/#add)

The great thing about this method is that the changes in the common module are reflected immediately. In other words, you don't have to rebuild each time.

You have now added your favorite shared module dependencies to your favorite main module: smile:

at the end

This time I've summarized how to implement a multi-project in Python. I had another hard time because I had to convert these to Dockerfile in actual work, but I would like to introduce it at another time. I hope it helps you even a little.

Recommended Posts

[ML Ops] I want to do multi-project with Python
I want to do ○○ with Pandas
I want to debug with Python
I want to analyze logs with Python
I want to play with aws with python
I want to do a full text search with elasticsearch + python
I want to do Dunnett's test in Python
I want to use MATLAB feval with python
I want to make a game with Python
I want to use Temporary Directory with Python2
#Unresolved I want to compile gobject-introspection with Python3
I want to solve APG4b with Python (Chapter 2)
I want to write to a file with Python
I want to do it with Python lambda Django, but I will stop
I want to handle optimization with python and cplex
I want to inherit to the back with python dataclass
I want to AWS Lambda with Python on Mac!
I want to run a quantum computer with Python
I want to do something in Python when I finish
To do tail recursion with Python2
What to do with PYTHON release?
I want to be able to analyze data with Python (Part 3)
I want to specify another version of Python with pyvenv
I want to be able to analyze data with Python (Part 1)
I want to do something like sort uniq in Python
I want to be able to analyze data with Python (Part 4)
I want to automatically attend online classes with Python + Selenium!
[Python] I want to use the -h option with argparse
I want to detect objects with OpenCV
I want to know the weather with LINE bot feat.Heroku + Python
I want to monitor UNIQLO + J page updates [Scraping with python]
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
I want to blog with Jupyter Notebook
I want to output the beginning of the next month with Python
I want to use jar from python
I wanted to solve ABC160 with Python
I want to build a Python environment
I want to pip install with PythonAnywhere
How to do portmanteau test with python
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I wanted to solve ABC172 with Python
What skills do I need to program with the FBX SDK Python?
I want to tweet on Twitter with Python, but I'm addicted to it
Note: I want to do home automation with Home Assistant + Raspberry Pi + sensor # 1
It's more recent, but I wanted to do BMI calculation with python.
Environment maintenance made with Docker (I want to post-process GrADS in Python
I want to do a monkey patch only partially safely in Python
How to do multi-core parallel processing with python
I want to analyze songs with Spotify API 2
I wanted to solve NOMURA Contest 2020 with Python
I want to memoize including Python keyword arguments
[Python] What I did to do Unit Test
I want to create a window in Python
I want to email from Gmail using Python.
[Python] I want to manage 7DaysToDie from Discord! 1/3
I want to mock datetime.datetime.now () even with pytest!
I want to display multiple images with matplotlib.
I want to knock 100 data sciences with Colaboratory
I wanted to install Python 3.4.3 with Homebrew + pyenv
I tried to get CloudWatch data with Python
I want to analyze songs with Spotify API 1