New in Python 3.9 (2)-Sort directed acyclic graphs in Python

Introduction

New changes in Python 3.9 scheduled to be released in October 2020 are summarized in the article What's new in Python 3.9 (Summary) I am. I have a separate article on a topic that seems to be a little voluminous, but this is the second one, about sorting directed acyclic graphs.

Directed acyclic graph and topological sort

First of all, the graph here is not a line graph, a bar graph, or a chart that visually represents data, but a graph of graph theory. A graph is a type of data structure that consists of nodes (vertices) and edges (branches) that connect them. By giving a node or edge some meaning, you can represent relevant information.

There are several types of graphs, and whether the first fork has a direction at the edge. A graph with a direction is called a directed graph (left side), and a graph without a direction is called an undirected graph (right side). スクリーンショット 2020-05-31 13.11.10.JPG

Edges connect nodes, but you can use a directed graph if there is some kind of master-slave relationship between the connected nodes, and an undirected graph if there is an equal relationship.

And the point is whether it is a cycle graph or not. If there is a node that returns to you by following the edge, it is called a cyclic graph, and if it is not, it is called a non-cycle graph. The Directed Acyclic Graph (DAG) has two characteristics, both directed and non-circulated, and is the graph we are dealing with this time.

This directed acyclic graph (DAG) can be used to represent a variety of information. In a strange example, the crypto asset IOTA uses DAG. Ordinary crypto assets use blockchain, and I think it can be called a one-way DAG, but IOTA connects it with a DAG that has multiple edges. it's interesting.

In a slightly more common application, there is a task scheduling problem with dependencies. Each task is a node, and the dependency of the task is represented by an edge. For example, if A → B, B cannot be started until A ends. Even if you have a list of tasks with complicated dependencies, you can see that they can be expressed in DAG by breaking them down into task-to-task relationships.

For example, B and C cannot start until A is over. And D cannot start until B and C are finished, such a dependency is expressed as follows. スクリーンショット 2020-05-31 14.29.53.JPG

A more complicated example would be something like this: スクリーンショット 2020-05-31 14.31.53.JPG

It's a little derailed, but there is a tool for visualizing DAGs called Graphviz. This describes the DAG in a unique format called the dot format, draws it, and converts it to an image format such as PNG. What's interesting is that there are several drawing algorithms available that will give you completely different outputs with the same input.

For example, if you write the second example above in dot format, it will look like this.

digraph example {
  graph [
    labeljust = "c"
  ];
  7, 5, 3, 11, 8, 2, 9, 10;
  7 -> 11;
  7 -> 8;
  5 -> 11;
  3 -> 8;
  3 -> 10;
  11 -> 2;
  11 -> 9;
  11 -> 10;
  8 -> 9;
}

If you try to output this with Graphviz's drawing tool, you can make so many variations. all.png

it's interesting. It doesn't look like the very same DAG, but they are all drawn from the same dot file.

Now, there is an order relationship between the tasks represented by the DAG, so you want to know in what order you should put them away. That is a schedule issue. For that purpose, it is necessary to arrange them one-dimensionally while satisfying the dependencies, which is called ** topological sort **.

Although the introduction has been lengthened, Python 3.9 adds a feature called TopologicalSorter to the functools module, which allows this topological sort to be done as standard.

Try using Topological Sorter

Let's try the example with A, B, C, D nodes shown at the very beginning.

>>> from functools import TopologicalSorter
>>> ts = TopologicalSorter()
>>> ts.add('B', 'A')
>>> ts.add('C', 'A')
>>> ts.add('D', 'B', 'C')
>>> tuple(ts.static_order())
('A', 'B', 'C', 'D')

After creating an empty sorter with TopologySorter (), we are adding nodes one by one. TopologySorter.add () lists the nodes to be registered in the first argument and the nodes that have dependencies on the subsequent arguments. Nodes that do not have dependent nodes (ʻAin this case) can skip registration (because they will appear as dependent when registering other nodes). Then, the result sorted byTopologySorter.static_order ()` is returned by generator. Since it is difficult to see as it is, it is converted to tuple and displayed.

If the graph is decided from the beginning, you can give it to the constructor and make it all at once.

>>> from functools import TopologicalSorter
>>> graph = {'B': {'A'}, 'C': {'A'}, 'D': {'B', 'C'}}
>>> ts = TopologicalSorter(graph)
>>> tuple(ts.static_order())
('A', 'B', 'C', 'D')

Here, what is given to the constructor is dictionary type data, in which the key is a node and the node (a set) on which the value depends. When specifying multiple, the set type is used here, but it seems that it can be a tuple or a list.

This is all you need to sort, but there are also some methods for when you want a little more control. First of all, "what can be done at the moment". It will be a list of tasks whose dependencies are satisfied, using get_ready ().

As an example, let's use the slightly complicated example above.

>>> from functools import TopologicalSorter
>>> graph = {
...    2: {11},
...    8: {7, 3},
...    9: {11, 8},
...   10: {11, 3},
...   11: {7, 5},
... }
>>> ts = TopologicalSorter(graph)
>>> ts.prepare()
>>> ts.get_ready()
(3, 7, 5)

After creating TopologySorter by giving the dependency data of the graph in the constructor, first prepare (). This is a method that performs pre-processing for subsequent processing. We are also checking if the graph registered here is patrol, and if it is patrol, CycleError will be displayed. In the above example, this prepare () was not called, but it is automatically called in static_order () (it is actually called when fetching the first value of generator).

Subsequent calls to get_ready () will return (3, 7, 5). It is listed as "executable" among the nodes included in the graph that have no dependencies. Calling prepare () again will return () this time. It means, "(For now) there is nothing else."

I will color the graph above. Light green indicates that the node is viable. wiki-1.png

So what if the 5 and 7 of the feasible tasks are done? The done () method teaches that to the sorter. If you call the completed tasks side by side as arguments, the internal state will change. So if you run get_ready () again, this time it will return (11,).

>>> ts.done(5,7)
>>> ts.get_ready()
(11,)

The figure is changed to a darker color to indicate that 5 and 7 are complete. And 11 is ready to run because all the dependent tasks have been completed. It matches the result of ts.get_ready () above. wiki-2.png

We will repeat this, but let's complete it in the order of 11 38.

>>> ts.done(11)
>>> ts.get_ready()
(2,)
>>> ts.done(3)
>>> ts.get_ready()
(8, 10)
>>> ts.done(8)
>>> ts.get_ready()
(9,)

wiki-3.png wiki-4.png wiki-5.png

You can check if there are any unexecuted tasks in the graph with ʻis_active (). In the above state, 2, 9, and 10 have not been executed yet, so you can see that the return value of ʻis_active () changes before and after making them done ().

>>> ts.is_active()
True
>>> ts.done(2, 9 ,10)
>>> ts.is_active()
False

Summary

I investigated and used the new ToplogicalSorter function added to the functools module of Python 3.9. I thought it might be useful for deciding the order of batch processing with dependencies.

Recommended Posts

New in Python 3.9 (2)-Sort directed acyclic graphs in Python
Bubble sort in Python
What's new in Python 3.5
New in Python 3.4.0 (1)-pathlib
Custom sort in Python3
What's new in Python 3.6
What's new in Python 3.10 (Summary)
Naturally sort Path in Python
python in mongodb in descending sort
What's new in Python 3.4.0 (2) --enum
What's new in Python 3.9 (Summary)
Sort by date in python
What's new in python3.9 Merge dictionaries
Sort large text files in Python
[Python] Sort
Python # sort
When specifying multiple keys in python sort
Line graphs and scale lines in python
New features in Python 3.4.0 (3)-Single-dispatch generic functions
Implemented Stooge sort in Python3 (Bubble sort & Quicksort)
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
[Python] Sort the list of pathlib.Path in natural sort
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
[CpawCTF] Q14. [PPC] Try writing Sort! In Python
Sudoku in Python
DCI in Python
quicksort in python
A memo about writing merge sort in Python
nCr in python
Draw graphs in Julia ... Leave the graphs to Python
[Neta] Thread-safe Sleep Sort function in Python (threading)
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
Multiple graphs are displayed in one window (python)
Create a new page in confluence with Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
Sort list elements in a specified order in Python
python in virtualenv