Roadmap for publishing Python packages

We have summarized the steps required to create and publish a Python package. We plan to add links to reference materials at a later date or create another article for specific usage and details of the tool.

I will release the package around February 2020! I thought, but I had a hard time not knowing what to start with. We have summarized the knowledge gained during the development process, so we hope it will be helpful for those who are thinking of releasing it in the future!

Target packages:

--I want you to publish it as a Python package and use it for the general public. ――I want to publish the code on GitHub etc. and proceed with development in consultation with users

Overview

  1. [Examination of contents](# 0-Examination of contents)
  2. [Development environment](# 1-Development environment)
  3. [Coding](# 2-Coding)
  4. [Test operation](# 3-Test operation)
  5. [Create Document](# 4-Create Document)

0. Examination of contents

Matters to consider:

--User demographic: How much background knowledge do you need, how much do you need to know about Python and other packages? --Purpose of use: What can users get?

tool:

--Mind map --Outline creation tool: WorkFlowy

1. Development environment

While building a development environment on a local PC, it is necessary to arrange a mechanism to publish the created code. Also, by using Git to record the change history at any time, you will be able to efficiently grasp the development status.

Repository (code storage / record of change history)

Create a Remote repository for development on GitHub and clone it to your local environment for development. When you're done, commit in your local environment and push to the Remote repository.

-Basics of Git and Github

Python execution environment

If you are using Windows 10, we recommend using WSL (Windows Subsystem for Linux) or WSL2 if possible.

-Install WSL

Introduction of Editor

Prepare an Editor or IDE (Integrated Development Environment) in your local environment.

Visual Studio Code (VScode) is recommended. A large number of plugins are available, making it easier to handle Git operations as well.

Manage dependent packages

pipenv and poetry seem promising. I'm using pipenv, but I'm considering migrating to poetry because I get frequent errors & slow. (It has not been verified whether poetry is faster ...)

Development workflow

It seems that there are various ways of thinking about how to use branches and when to merge, such as Git Flow and GitHub Flow. If you are developing individually or at the level of several people, we recommend using the simple Gitlab Flow.

Also, it is convenient to use the issue function of GitHub as a place for todo-list & discussion. Since individual numbers are assigned to issues and pull requests such as # 1, raise a problem with issue # 1, work on branch ʻissue1` based on the issue number, and default by pull request. The procedure is to merge it into a branch.

-Explanation of Gitlab-flow -How to use GitHub Issue-From creation / writing to closing [Cannot be deleted]

Versioning As a major flow, I think it would be convenient to develop in two parts, the "development version (managed by GitHub)" and the "stable version (managed by PyPI)". The development version is created according to the development flow described above.

There seemed to be little fixed naming of the development version. In my case, I'm giving one development version right after the merge occurs and the issue is closed.

(Example) When issue # 2 is closed due to a bug fix: 2.0.1-alpha.new.1 → 2.0.1-beta.new.1.fix.2

Also, if a large number of issues are closed or if you need to urgently fix a bug in the stable version, you can change the stable version according to Semantic Versioning. Please raise the version.

2. Coding

We have summarized the items that need attention when coding, such as the folder structure.

Package name

If you create a one-line description of the package in English and select an alphabet from it to make it the package name, you will not have to worry too much. I don't care about the order of alphabet selection (laughs)

Example: Python package for ** COV ** ID-19 anal ** y ** sis with ** ph ** ase-dependent ** SIR **-derived ODE models = ** CovsirPhy **

You may also want to use Google search to avoid overlapping with other service names.

I think it will be difficult to change it later because it will cause a lot of trouble such as changing the installation method.

Package identification information

If you use poetry, you will need to write it in a separate file, but if you are using pipenv to manage dependent packages, create files setup.py and setup.cfg at the top of the repository. please.

setup.py


from setuptools import setup
setup()

setup.cfg


[metadata]
name =package name
version = attr:package name.__version__.__version__
url =URLs such as repositories
author =Author name
author_email =mail address
license = Apache License 2.0
license_file = LICENSE
description =One line description of the package
long_description = file: README.rst
keywords =keyword
classifiers =
    Development Status :: 5 - Production/Stable
    License :: OSI Approved :: Apache Software License
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.8


[options]
packages = find:
install_requires =
    #Dependent package name
    numpy
    matplotlib

Please replace the part written in Japanese and the license name as appropriate! Also, if you add a dependent package, you will need to add it to the "install_requires" field at any time.

Folder structure

Create a folder with the package name (or src folder) at the top of the Repository. First, create empty files __version__.py and __init__.py in it.

And let's decide the folder structure (module structure). Be careful of circular import [^ 1]. It happens that src / A / a1.py is imported by src / B / b1.py and src / B / b2.py is imported by src / A / a2.py. Let's not. It doesn't cause an error in the development environment, but it seems that an error may occur when pip installing the stable version, so I have had a hard time. I've raised the stable batch number twice just for this fix.

[^ 1]: What happens if you import cyclically with Python

Managing with src / __ version__.py instead of setup.cfg is convenient because you can get and display the version number even in the package. In the above example of setup.cfg, this file is called by version = attr: package name .__ version __.__ version __.

__version__.py


__version__ = "0.1.0"

The following is the case of creating a function called line_plot in src / util / plotting.py. By writing from src.util.plotting import line_plot, you can call from src import line_plot instead of from src.util.plotting import line_plot when pip install.

__init__.py


#!/usr/bin/env python
# -*- coding: utf-8 -*-

# version
from src.__version__ import __version__
# util
from src.util.plotting import line_plot


def get_version():
    """
    Return the version number,like package name v0.0.0
    """
    return f"Package name v{__version__}"


__all__ = ["line_plot"]

By writing __all__ = ["line_plot "], you can avoid the correction confirmation of code formatting tools such as pylint (although you can change the setting of pylint to hide the correction confirmation). Also, you can use line_plot just by doingfrom src import *, but ʻimport *` is deprecated ...

Register with PyPI

If you have the minimum code, setup file, and version number, you can register your package on test-PyPI or PyPI!

3. Test operation

It is unavoidable that bugs will occur (it can also be a driving force for development), but we want to eliminate bugs as much as possible before publishing so as not to wear down the spirit. I don't think it's necessary to write tests 100% exactly, but I think developers are required to write test code and check the operation in advance, at least for the method that users mainly use.

Creating a test

There are several known test code creation / execution packages such as ʻunittest, pytest, doctest, nose. I personally use pytest`.

Autorun test

Basically, it is "develop locally & test locally", but if the number of test code increases and it takes more than 10 minutes for one test, it may not be possible to handle it locally.

In that case, consider running the test automatically with the help of the Continuous integration (CI) tool. Known are GitHub Actions (created from the GitHub repository screen), Travis CI, Semaphore. If it is an Open source project, you can use it for free to some extent.

Thank you very much for Semaphore. There aren't many articles in Japanese, so I plan to write an article at a later date.

For data science

In the case of a package for data science (downloading a CSV file from a database / analyzing data), the input arguments to each function and each class change from moment to moment, and the developer predicts and tests all the input values. Is difficult to make. I think the following three patterns can be considered as solutions.

--Check the type of input data (ʻis instance (data, pandas.DataFrame) , set (data.columns) .issubset (fields) `) --Check the output data type --Create virtual input data and check if the output data is as expected

4. Document creation

Documents are essential to convey and promote how to use and use the package.

Creating a docstring

First, write the docstring for each Python function / class / method (preferably in English). I think it is better to write in parallel when implementing the code.

There are reStructuredText-style, Google-style, and numpy-style as notations, but Google-style, which is less likely to be redundant up and down, is recommended. (When I was using Python only for the Private project, I wrote it in my own way similar to Numpy-style, but I moved to Google-style.)

Use case documentation

If you want to document the usage example, Jupyter Notebook expression (.ipynb) may be useful. You can view the description, code, and results together.

Example: CovsirPhy: Usage (quickest version)

Convert to Markdown / html

You can use pandoc or sphinx (api-doc) to semi-automatically convert docstring and .ipynb files to markdown / html format. I will omit it in this article because it will be long, but it is easy because you can create a homepage only with Python without knowing the details of HTML and CSS.

Publishing the document

Use GitHub Pages to update the html file in the GitHub repository from time to time. You can publish and update the document just by yourself.

Afterword

I omitted the details and rushed, but thank you for browsing. Please let us know if there is any excess or deficiency or other methods. It will be updated from time to time.

Thank you for your hard work!

Recommended Posts

Roadmap for publishing Python packages
python [for myself]
Roadmap for beginners
Settings for uploading Python packages locally to PyPI
Python basics ② for statement
Python packages and modules
About Python, for ~ (range)
python textbook for beginners
Using Python #external packages
python for android Toolchain
OpenCV for Python beginners
Install Python (for Windows)
[Python] for statement error
Python environment for projects
Python memo (for myself): Array
Python list, for statement, dictionary
Python for Data Analysis Chapter 4
Modern Python for intermediate users
Learning flow for Python beginners
Python 3.6 installation procedure [for Windows]
BigQuery integration for Python users
Python learning plan for AI learning
Set Up for Mac (Python)
Understand Python packages and modules
Search for strings in Python
Python Tkinter notes (for myself)
OpenCV3 installation for Python3 @macOS
[Python] xmp tag for photos
Python environment construction For Mac
Techniques for sorting in Python
pp4 (python power for anything)
Python3 environment construction (for beginners)
Python 3 series installation for mac
Python #function 2 for super beginners
Python template for Codeforces-manual test-
Basic Python grammar for beginners
3 months note for starting Python
Qt for Python app self-update
Python for Data Analysis Chapter 2
100 Pandas knocks for Python beginners
Checkio's recommendation for learning Python
Keyword arguments for Python functions
[For organizing] Python development environment
[Python] Sample code for Python grammar
Python for super beginners Python #functions 1
[Python / PyQ] 4. list, for statement
Simple HTTP Server for python
[Python + Selenium] Tips for scraping
Python #list for super beginners
~ Tips for beginners to Python ③ ~
Extract only Python for preprocessing
Indentation formatting for python scripts
Introduction to Python For, While
About "for _ in range ():" in python
tesseract-OCR for Python [Japanese version]
[Python] Iterative processing (for, while)
Python for Data Analysis Chapter 3
Install dlib for Python (Windows)
Check for memory leaks in Python
Python3 standard input for competitive programming
[Package cloud] Manage python packages with package cloud