Get an abstract understanding of Python modules and packages

Introduction

About the Python import system

--Module is a .py file --The package is the directory where you put __init__.py

You may understand that, By moving away from physical files and thinking a bit more abstractly, I think we can get a better view of the entire system, such as module names, relative imports, initialization timing, and name visibility.

Note that this article does not cover the ** namespace package [^ 1] ** (because I don't understand it).

References

I browsed the v3.8.2 version of both.

--Python language reference, 7.11. import statement: https://docs.python.org/ja/3/reference/simple_stmts.html#the-import-statement --Python Language Reference, 5. Import System: https://docs.python.org/ja/3/reference/import.html --Python Tutorial, 6. Modules: https://docs.python.org/ja/3/tutorial/modules.html

environment

I confirmed the operation with Python v3.8.2. Verify the operation as a test case of pytest.

pip install pytest

See the appendix for checking the operation of pytest itself.

Overview

--A module is a namespace unit that can be imported. --All code runs in a module (namespace). The top level module name is __main__. --When you import a module, you can see the module and its namespace from the current namespace. --A package is a module that can have submodules. --Once imported modules are cached globally.

A brief summary of the attributes:

__name__ __path__ __package__
module: module名 None Parent package name
package: package名 packageのパス package名

What is a namespace?

This is a commonly used concept in programming languages ** This is a function that allows you to use "names" such as variable names x in different places in a program with different meanings without fear of collision. ** ** For example, builtins.open and ʻos.open` are both function names, but because they belong to different namespaces. It can be defined and used as different. The "module" described below is this ** namespace unit **.

There is a glossary in the Python docs with a "namespace" entry: https://docs.python.org/ja/3/glossary.html#term-namespace

What is a module? What is an import?

As mentioned in the overview, a module is a ** namespace unit that can be imported **.

Namespace unit

First, because modules are namespace units, for modules m and n, the names in m`` x (= mx) and the names in n x (=nx) Is distinguished.

All code works in a namespace

Second, less consciously, all the code runs in a module, that is, in a namespace. When the code declares and defines a name, it is registered in that namespace. Immediately after the Python interpreter is launched, the code is running in the ** __ main__ module namespace **. This is often used in ** a way to tell if a .py file was launched directly or imported **.

Module import

The module can then be imported. Importing a module means

--Find and cache the module and --Load and initialize, --Binding a module to the current namespace

That is. The document says 2 steps, but if you think of it as 3 steps, the processing unit is easy to understand. Simply put, by importing the module m as ʻimport m`

--Search for module m in sys.path and register the module object in the global cache sys.modules. --Register a name in that namespace by loading and initializing the module m. For example, m.f or m.C. --Register the name m in the current namespace so that it points to the module m.

Start the Python interpreter in interactive mode and see for yourself. (Partially shaped for readability)

[.../module-experiments]$ python
Python 3.8.2 (default, Apr 18 2020, 18:08:23) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> print(__name__)
__main__

>>> import m
>>> sys.modules['m']
<module 'm' from '.../module-experiments/m.py'>
>>> m
<module 'm' from '.../module-experiments/m.py'>
>>> m.f
<function f at 0x108452b80>
>>> m.C
<class 'm.C'>

You can see that the module immediately after starting is __main__. You can also see that by importing m, you can access m, m.f and m.C.

The imported modules are cached in the local namespace. Therefore, like any other name, modules imported within a function cannot be accessed by other functions. However, since the cache is global, it can be accessed.

What is a package?

A package is just like a module, except that you can search for and import submodules within it. By using packages, you can create a hierarchical structure of modules.

Put simply

The name of the module (or package) will then be something like ʻa.b.c separated by dots. By importing the module ʻa.b.c with ʻimport a.b.c`

--Import module (package) ʻa --Import the submodule (package)b (ʻa.b) in the namespace of module ʻa --Import the submodulec (ʻa.b.c) in the namespace of module ʻa.b`

This allows you to access the module ʻa.b.c with ʻa.b.c. ** Note: You will not be able to access it with c. ** **

You can also see that this rule allows you to import packages and modules in the same way.

Of course, ʻa.d and ʻa.d.e cannot be accessed immediately after importing ʻa.b.c. If you do ʻimport a.d.e here, ʻa` exists in the cache, so

--Import the submodule (package) d (ʻa.d) in the namespace of module ʻa --Import the submodule ʻe (ʻa.d.e) in the namespace of module ʻa.d`

Only happens, so at these points the module is initialized.

A little more detail

Technically, a package is a module that has the __path__ attribute. This attribute is used when looking for subpackages. In other words, in ʻimport a.b.c`,

--Find ʻa in sys.path and import --In the namespace of module ʻa, find ʻa.b from ʻa.__path__ and import it. --In the namespace of module ʻa.b, find ʻa.b.c in ʻa.b.path` and import it.

That is happening.

Note that the module containing the package has the __name__ attribute. For imported modules, this value is the fully qualified name of the module (such as ʻa.b.c`).

Package import

Since a package is a module, it can be imported with the import statement like a module.

import ... as ...

If the import statement has a ʻas clause, the imported module itself is bound to the name below ʻas. For example, ʻimport a.b as b allows you to access the module ʻa.b directly by name b.

At this time, since the module ʻa is loaded and initialized, the information about the module ʻa (and of course, ʻa.b) is cached. However, ʻa is not bound to the namespace.

>>> import sys
>>> import a.b as b
['/Users/shinsa/git/python-experiments/module-experiments/a']
a

>>> sys.modules['a']
<module 'a' from '/Users/shinsa/git/python-experiments/module-experiments/a/__init__.py'>
>>> sys.modules['a.b']
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>

>>> b
<module 'a.b' from '/Users/shinsa/git/python-experiments/module-experiments/a/b/__init__.py'>

>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined

Module name and name are different

The module name comes immediately after ʻimport, and you cannot use the defined name in this place. Therefore, just because ʻimport a.b as b is used, ʻimport a.b.c cannot be written as ʻimport b.c.

Import with from clause

This form of import takes the case of from a.b import c. c can be a module, a package, or a name. The following processing is performed.

--Import ʻa.b. However, the name is not bound. --Attempts to bind c. --If c is the name defined in ʻa.b, bind ʻa.b.c to the name c. --If c is a module or package, import ʻa.b.c (but do not bind the name) and bind ʻa.b.c to the name c`.

If the ʻasclause is specified, instead of bindingc in the above example, it is bound to the name specified in the ʻas clause.

from ... import *

If the name to import is *, that is, from M import *, then in the module M

--The name specified by __all__ if defined --Otherwise, names defined with M that do not start with _

Are all bound in the current namespace.

Relative import

In the from ... import ... statement, you can use relative package representations in the from clause. For example, in the initialization script for package ʻa.b, or in the script for module ʻa.b.c.

--from . means from a.b, --from .. means from a, --from .d is from a.d,

Represent each. You may have more dots.

PEP 366

Relative package calculations were based on the __name__ attribute. Then, for example, when you start the interpreter with python -m a.b.c, ʻa / b / c.py operates as a main module. Due to this, there was a problem that, for example, ʻa.b.d could not be relatively imported.

Newer versions of Python now calculate relative imports based on the __package__ attribute, which resolves the above call-time issue (PEP 366). However, if you start it with python <path> /a/b/c.py, you need to set the attributes manually.

Default implementation: Import system for file system

So far, modules, packages, and __path__ have been explained abstractly. In other words, I'm not talking about a specific file system at all. In recent versions, Python's import system has become a highly abstract and powerful system.

By default, the import system operates on a file system. By default, modules are represented by .py files and packages are represented by directories. The details are as follows, and it can be seen that it fits the abstract data structure so far.

--The contents of sys.path are regarded as the file path and searched. ʻImport m searches the directories contained in sys.pathin order, andm.py in one of the directories is imported. --Package ʻa.b is loaded and initialized by executing ʻa / b / __ init__.py. --Loading and initializing the module ʻa.b.c is done by executing ʻa / b / c.py. --The path attribute value of package ʻa.b is the directory path of b.

Conversely, for example, a URI-based import system can be implemented, and the service discovery mechanism can be implemented using Python's import system.

Finally: A little dissatisfied

The Python documentation is confusing because the information is scattered all over the place. It's still understandable if you do so to avoid duplication of information, Instead, each piece of information is subtly duplicated but scattered. In addition, some information is out of date.

Recommended Posts

Get an abstract understanding of Python modules and packages
Python packages and modules
Understand Python packages and modules
Python Basic Course (14 Modules and Packages)
Modules and packages in Python are "namespaces"
Full understanding of Python threading and multiprocessing
Organize python modules and packages in a mess
[Python] A rough understanding of iterators, iterators, and generators
List of python modules
Python asynchronous processing ~ Full understanding of async and await ~
Python: Get a list of methods for an object
[python] Get quotient and remainder
Source installation and installation of Python
Full understanding of Python debugging
Get rid of dirty data with Python and regular expressions
Sample of HTTP GET and JSON parsing with python of pepper
Environment construction of python and opencv
The story of Python and the story of NaN
MIDI packages in Python midi and pretty_midi
multiprocessing memorandum
Instant multiprocessing
Full understanding of Python threading and multiprocessing
Installation of SciPy and matplotlib (Python)
Note: Get the first and last items of Python OrderedDict non-destructively
Python and DB: Understanding DBI cursors
Proper use of Python visualization packages
[Python] How to get the first and last days of the month
Get the attributes of an object
This and that of python properties
[Python] Understanding the potential_field_planning of Python Robotics
Get and set the value of the dropdown menu using Python and Selenium
Coexistence of Python2 and 3 with CircleCI (1.0)
Summary of Python indexes and slices
Get the title and delivery date of Yahoo! News in Python
Reputation of Python books and reference books
Get images of OpenStreetMap and Geographical Survey Institute maps with Python + py-staticmaps
Get a list of packages installed in your current environment with python
Comparing the basic grammar of Python and Go in an easy-to-understand manner
Open an Excel file in Python and color the map of Japan
Get the number of articles accessed and likes with Qiita API + Python
Get images of OpenStreetMap and Geographical Survey Institute maps with Python + staticmap
Get and estimate the shape of the head using Dlib and OpenCV with python
Get the return value of an external shell script (ls) with python3
Explanation of creating an application for displaying images and drawing with Python
Installation of Visual studio code and installation of python
A rough understanding of python-fire and a memo
Python virtual environment and packages on Ubuntu
The story of making Python an exe
Extraction of tweet.js (json.loads and eval) (Python)
[Python] Package and distribute your own modules
Connect a lot of Python or and and
[python] Get a list of instance variables
(Note) Bulk upgrade of python installed packages
Summary of modules and classes in Python-TensorFlow2-
[Python] Get the character code of the file
Easy introduction of python3 series and OpenCV3
[Python] Various combinations of strings and values
[Python] Get a list of folders only
Idempotent automation of Python and PyPI setup
Get rid of DICOM images in Python
Project Euler # 1 "Multiples of 3 and 5" in Python
Python3> pass statement> create the smallest class / contents of functions and conditions> when thinking at an abstract level
Get the stock price of a Japanese company with Python and make a graph