Python packages and modules

ChangeLog

--2020-04-18: Corrected the behavior in the __main__ module.

Overview

Let's take a quick look at Python packages and modules, which many people seem to understand and use. It's not very organized, so I'll rewrite it someday.

Motivation

I have Python machine learning code written by someone else. There is a file you want to execute in the location ./a/b/c.py from the current directory. This file is written in a common format that works in two ways depending on the value of __name__. Inside, ./a/b/d.py is ** absolutely imported ** by ʻimport d`.

At this time, assuming that PYTHONPATH is left as the default, which one works correctly when it is set to python ./a/b/c.py or when it is set to ʻimport abcfrom another python file. Will change. If the latter does not work when the former is working, and the latter works, if you rewritec.py` to relative import, the former will not work. What the hell is going on is the motivation for studying this time.

Let me explain the latter situation a little more. You are running pytest ./test/<test file> .py or python -m pytest ./test/ <test file> .py for unit testing. The point is that the test file is not in the current directory.

References

I browsed the v3.8.2 version of both.

--Python Tutorial, 6. Modules: https://docs.python.org/ja/3/tutorial/modules.html --Python Language Reference, 1. Command Line and Environment: https://docs.python.org/ja/3/using/cmdline.html --Python Language Reference, 5. Import System: https://docs.python.org/ja/3/reference/import.html

Module = file

Note that the tutorial reads top-level files, that is, files launched by python <file> as ** scripts **.

Importer

The simplest

--import statement ʻimport m imports ** module ** m --The modulem is a ** Python file ** called m.py. --This allows you to access the ** names ** (variable names, function names, class names, etc.) defined in m.py with m.n`.

I understand.

--Actually, it is often necessary to include the package path in addition to the module name (see below). --Actually, you can also import packages (described later)

Imported side

--You can refer to your own module name with __name__. In the m imported by the above method, __name__ becomes " m " --However, if you are at the top level, that is, if your module is a file specified by the argument of the python command, __name__ becomes " __main__ ". Think of the top-level environment as running as a __main__ module. --Modules can import other modules

Module search path (sys.path)

In the above example, the module to be imported m, that is, the file m.py, is at the beginning of the list of directories set to sys.path (the value of the path variable of the sys module). It is searched in order from. By default, they are arranged in the following order:

  1. Directory with ** script **, current directory if not specified (for example, when an interactive session was run by python)
  2. The value of the PYTHONPATH environment variable
  3. Default for each installation (likely the directory where pip installs the library)

Here, the first is the songwriter, ** the directory containing the executed file is searched instead of the current directory. ** ** Both are the same only if you execute the file in the current directory.

Solving the example given in "Motivation" (1)

This almost revealed the reason for the problem I wrote in "Motivation." If you run python ./a/b/c.py in the former way, the first rule puts ./a/b at the beginning of sys.path. Then ʻimport dis first searched for this directory, and the import succeeds because it actually has./a/b/d.py`.

On the other hand, when pytest is executed, sys.path starts with ./test, so ʻimport abc does not pass in the first place, and even if it is avoided by setting PYTHONPATH, c ʻimport d in .py does not pass this time.

If ʻimport dis changed tofrom .import d` (was it?) And ** relative import **, the former pattern will certainly not pass [confirmation required].

sys.path anti-pattern

If the module directory structure is complicated, or if you are trying to realize dynamic import,

--Trying to do something by dynamically rewriting sys.path in your Python program

It is said that ** is a bad move ** [citation needed]. It seems correct to make full use of ʻimportlib`.

Package = directory

A ** package ** is a collection of the above modules in a directory. The directory name becomes the package name. Directories can have a hierarchical structure, which corresponds to the package path (dot-separated sequence of package names).

Importer

--ʻImport a.b.cimports the modulec from the package ʻa.b. That is, import ʻa / b / c.pyin the module's search target directory. --The directoryp must contain the ** p / __ init__.pyfile ** in order to be recognized as the packagep`.

From here, many people seem to understand it only somehow. I am so too.

-** You can import the package itself **. That is, it can be ʻimport a.b --When the package itself, for examplep, is imported, the result of running p / __ init__.pyas a module goes into thep namespace. --When importing the package ʻa.b, ʻa is imported first, and then ʻa.b is imported. That is, ʻa / __ init__.py is executed, then ʻa / b / __ init__.py is executed. --When importing the module ʻa.b.c, the file is executed as above, and then ʻa / b / c.py is executed.

Imported side

When the package p is imported, the variable __path__ in p / __ init__.py can refer to the string representing the directory of p.

from ... import ... statement

--If you execute ʻimport a.b.c with a simple import statement, you must use ʻa.b.c.n to reference the name n of the module c. --However, you can refer to it with c.n by setting from a.b import c. --from import can be --Name n. If you do from a.b.c import n, you can refer to it only with n after that. --Module c. As I explained earlier --Package b. With from a import b. After that, the package b can be referenced only by b instead of ʻa.b`.

If you import the package b as above, b / __ init__.py will be executed. This file is empty by default, and when it is empty, the interpreter does nothing, so importing ** b does not allow you to access ʻa.b.c`. ** **

Relative import from imported module

When importing other modules, the above-mentioned absolute import is usually used. The module search is done for sys.path. You can use the from import statement above to do a ** relative import **.

Within the imported module (that is, the .py file)

You can import packages, modules, names, etc. relative to each other. At this time, . and .. are resolved based on the ** package path ** where the current module exists. That is, if the current module is ʻa.b.m, that is, the package it belongs to is ʻa.b, then . is ʻa.b and .. is ʻa.

Note that the __main__ module does not have a package path. An important consequence of this is that you can't do relative imports from modules running as ** __main__. ** **

In addition, I feel that it is okay to do something like ʻimport ..p.q.m`, but it seems that it can not be done

Solving the example given in "Motivation" (2)

This solves the second question. Rewriting ʻa / b / c.pyto relative import likefrom .import d works fine if c is imported with ʻimport abc, but python a / When run as b / c.py, c.py behaves as module __main__ and relative import is not available. As a result, I was disappointed with the behavior.

Furthermore, as a consequence of this, modules imported from code that you want to use as both scripts and modules, such as ʻa / b / c.pyin the example, need to bepip install`. That way you can always use absolute import, so you can use either method.

Unresolved issues

It was also confirmed that the import behavior differs depending on how pytest is started. I tried putting a print statement in the test script (-s is an option to prevent pytest from capturing standard output)

--pytest -s test / <test file.py> starts sys.path with the test directory, followed by the system default values. --But if you do python -m pytest -s test / <test file.py>, the empty string " " is added after the test directory. Perhaps this means the current directory.

Some of the reasons for this are also known. If you specify a module with -m when starting the Python interpreter, the current directory is added to the beginning of sys.path. In this case, it may be pytest, not the interpreter, that adds test to sys.path. Because in this case, -s test / <test file.py> is just an argument, not the name of the script to be executed [examine]. Also, when you install a command with pip, check the settings such as sys.path when you start the command.

at the end

Any mistakes are welcome.

Recommended Posts

Python packages and modules
Understand Python packages and modules
Python Basic Course (14 Modules and Packages)
Organize python modules and packages in a mess
Get an abstract understanding of Python modules and packages
MIDI packages in Python midi and pretty_midi
Introductory Python Modules and conditional expressions
Python virtual environment and packages on Ubuntu
[Python] Package and distribute your own modules
[python] Compress and decompress
List of python modules
Python and numpy tips
[Python] pip and wheel
Batch design and python
Python iterators and generators
Julia Quick Note [22] Calling Python functions and Python modules
Vue-Cli and Python integration
Ruby, Python and map
python input and output
Using Python #external packages
Python and Ruby split
Python3, venv and Ansible
Python asyncio and ContextVar
Python --Explanation and usage summary of the top 24 packages
Manage Python runtime packages and development environment packages with Poetry
Programming with Python and Tkinter
Encryption and decryption with Python
Python: Class and instance variables
3-3, Python strings and character codes
Python 2 series and 3 series (Anaconda edition)
Python and hardware-Using RS232C with Python-
Python on Ruby and angry Ruby on Python
Python indentation and string format
[Python] Loading multi-level self-made modules
Python real division (/) and integer division (//)
Install Python and Flask (Windows 10)
About python objects and classes
About Python variables and objects
Apache mod_auth_tkt and Python AuthTkt
Å (Ongustromu) and NFC @ Python
# 2 [python3] Separation and comment out
Python shallow copy and deep copy
Python and ruby slice memo
Python installation and basic grammar
I compared Java and Python!
Python shallow and deep copy
About Python, len () and randint ()
About Python datetime and timezone
Install Python 3.7 and Django 3.0 (CentOS)
Python environment construction and TensorFlow
Python class variables and instance variables
Roadmap for publishing Python packages
Ruby and Python syntax ~ branch ~
[Python] Python and security-① What is Python?
Stack and Queue in Python
python metaclass and sqlalchemy declareative
Fibonacci and prime implementations (python)
Python basics: conditions and iterations
Python bitwise operator and OR
[Note] Classes, modules, packages, libraries
Python debug and test module