[PYTHON] Misunderstandings and interpretations of Luigi's dependencies

I've been misunderstanding about the Luigi framework and suffered so much that some people may suffer from the same troubles, so I'll write it down.

What is addiction

Misunderstanding

--Wait for the dependent task to complete

Actually

--Wait for the completion of the dependent task --Successful dependent task is required to continue subsequent processing

What is the success or failure of a task in a dependency?

Task success

File output to the target specified by `ʻoutput``

Task failure

Exception in task

This misleading problem

For example, consider the following case. The process of reading a list of URLs of 1000 lines or so in the list taken by ʻinput`` and taking files from that URL. I think it's a common process, but there is a trap here. I don't want to do serial processing to download 1000 files, and I want to give parameters to the task based on the data collected by ʻinput``, so [Dynamic dependency](http: //: I think it will be written as luigi.readthedocs.io/en/stable/tasks.html#dynamic-dependencies). If even one of the 1000 created tasks fails, the subsequent processing will not be executed. However, it is often possible that one or two tasks will fail due to a malfunction of the WEB server or a URL description error, and if that causes the subsequent processing to stop, it is a problem. .. In this case, the conclusion is that the subsequent processing task and the task that is generating the task should not be dependent on each other, and the processing should be written outside of luigi.

Recommended Posts

Misunderstandings and interpretations of Luigi's dependencies
Mechanism of pyenv and virtualenv
Pre-processing and post-processing of pytest
Combination of recursion and generator
Combination of anyenv and direnv
Explanation and implementation of SocialFoceModel
Differentiation of sort and generalization of sort
Coexistence of pyenv and autojump
Use and integration of "Shodan"
Problems of liars and honesty
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError
Comparison of Apex and Lamvery
Source installation and installation of Python
Introduction and tips of mlflow.Tracking