For me, who wrote only niche things, I felt that the reaction was great, so I tried a little more. It is a story.
Click here for this code. What I'm doing is the same as last time. https://github.com/keisuke-yanagisawa/study/blob/20151205/luigi/param_tuning.py
The theme this time is ** I want to create a general-purpose task **. I make various tasks, but I'm happy to make a general-purpose one everywhere. Therefore,
--Aggregate the results of parameter tuning --Only the parameters with the best values are output together in csv format.
I made a task that does that for general purposes.
class param_tuning(luigi.Task):
tasks = luigi.Parameter() # luigi.One-dimensional array of Task
text_format = luigi.Parameter() #Pass a python "variable name description" regular expression
reduce_pivot = luigi.Parameter() #Which variable to use for aggregation
reduce_rule = luigi.Parameter(default="min") #Specify the function to aggregate, min or max
out_file = luigi.Parameter() #Output file name
def requires(self):
return self.tasks;
def output(self):
return luigi.LocalTarget(self.out_file)
def run(self):
# making pandas dataframe
results = []
for task in self.requires():
with task.output().open() as taskfile:
string = taskfile.read()
groupdict = re.search(self.text_format, string).groupdict()
results.append(groupdict)
df = pd.DataFrame.from_dict(results);
df[self.reduce_pivot] = convert2num(df[self.reduce_pivot])
values = df[self.reduce_pivot]
# Aggregation of parameter tuning results
if self.reduce_rule == "min":
best_val = min(values)
elif self.reduce_rule == "max":
best_val = max(values)
else:
print("reduce_rule must be min or max. your input is %s" % self.reduce_rule)
exit(1);
# Rearrangement of column order
column_order = filter(lambda key: key != self.reduce_pivot, df.columns) + [self.reduce_pivot]
df = df[column_order]
# Outputting results as csv formatted data
df[df[self.reduce_pivot] == best_val].to_csv(self.output().fn, index=False);
Coding was troublesome for various aggregation relationships, so I left it to pandas. It ’s very easy to do,
requires ()
It is a mechanism.
Input is a little confusing. I think you can basically understand tasks, reduce, and output, but for general purpose, it became awkward to make an interface that inserts regular expressions.
How to use the code itself posted on github
python param_tuning.py main_task --local-scheduler
I feel like it will work if you do something like that.
Regarding how to use this general-purpose task, we will prepare a separate "task for calculating with parameters" and "task for main function".
The regular expression you write in the main task is probably the biggest difficulty (I haven't used it too much) and I'll explain it.
The calculation execution task task_param_eval
this time outputs a one-line csv format file called cost, gamma, error
, so specify it as follows.
s = "[-+]?\d*\.\d+|\d+" ## float or int expression
text_format = "(?P<cost>"+s+"),(?P<gamma>"+s+"),(?P<error>"+s+")"
You can specify the name by using ? P <name>
. This is used as the name of the header and pivot of the output csv, so please specify it properly.
I'm a person who is desperately bad at modularization for coding, but I think it's very good to be able to cut it "unavoidably" when such coercive force works. This time, I tried to create a general-purpose article, which is one of the benefits obtained by the modularization. ... I would appreciate your guidance and encouragement that the coding is not good in the first place.
For how to find numbers with regular expressions, borrow from stack overflow below. http://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string-in-python
Recommended Posts