That's all for the title!
…… Because it's something, with a few examples.
As anyone who has used it knows, workflow manager Luigi abstracts the data generated by a task and treats it as an inherited class of Target
. For the time being, it works if you can implement the ʻexists () method to see if it already exists. For example, if it is a local file,
LocalTarget` is used, but if you write it according to the tutorial for the time being, it will be as follows.
from luigi import Task, ExternalTask, LocalTarget
import pandas as pd
class RawFile(ExternalTask):
def output(self):
return LocalTarget('path/to/file.csv')
class Aggregation(Task):
def requires(self):
yield RawFile()
def run(self):
df = pd.read_csv(self.input()[0], sep='\t', parse_dates=[7, 8], encoding='cp932')
...
Well ... well, here comes some other task that depends on RawFile
. Let's think about that. I'm gradually starting to write this kind of code ...
class Plot(Task):
def requires(self):
yield RawFile()
def run(self):
df = pd.read_csv(
Wait a minute. Do you remember the right options to pass to the CSV file parser for every project you're involved in? ** I don't remember. ** Would you like to write an option every time? I don't want to write ** absolutely ** many times. Luigi abstracts Target into the path of the original data, but it's definitely dull. That's right, ** you just have to abstract the reading **.
from luigi import Task, ExternalTask, LocalTarget
import pandas as pd
class RawFileTarget(LocalTarget):
path = 'path/to/file.csv'
def __init__(self):
super(RawFileTarget, self).__init__(path)
def load(self):
return pd.read_csv(self.fn, sep='\t', parse_dates=[7, 8], encoding='cp932')
class RawFile(ExternalTask):
def output(self):
return RawFileTarget()
Let's define Target
, which definesload ()
, and so on.
That way, any dependent task
class Aggregation(Task):
def requires(self):
yield RawFile()
def run(self):
df = self.input()[0].load()
...
class Plot(Task):
def requires(self):
yield RawFile()
def run(self):
df = self.input()[0].load()
...
It's refreshing. Later, the person in charge of data handling was given a CSV file with different specifications from the one with the words "I'm sorry, the one I was given before, I made a mistake ...", but I misunderstood. Even if the passed file is Excel, you can just rewrite RawFileTarget.load ()
. Happy!
As the title says, I'm happy to write save ()
so that it corresponds to load ()
, but I'll omit the code example.
Recommended Posts