[PYTHON] Parameter tuning with luigi

Recently, I'm using a data flow control framework called luigi, but I feel that it's relatively easy to use, so I'll try to write a little missionary text. ... There aren't many Japanese materials ...

About luigi itself http://qiita.com/colspan/items/453aeec7f4f420b91241 http://qiita.com/keisuke-nakata/items/0717c0c358658964f81e Please refer to the detailed description in.

To briefly explain the goodness, luigi terminates each child class (= task) that inherits luigi.Task and obtains the whole calculation result. By limiting the data transfer to files, even if there is a bug in the middle or the calculation time limit is exceeded, the already calculated part can be left and resume is possible. (Maybe on-memory cannot be delivered ...? → Addendum: luigi.mock

Machine learning with luigi

It is sad to redo all the calculations if the calculation is done on-memory at the time of parameter tuning and it falls in the middle. So, I thought that I could use the goodness of luigi, so I wrote the code for the time being. https://github.com/keisuke-yanagisawa/study/blob/20151204/luigi/param_tuning.py What you need

Three of.

This program

python param_tuning.py task_param_tuning --local-scheduler

If you do it, it will run. You'll specify the root task. Also, luigi is usually used by launching the scheduler, but since it is troublesome, I try to run it independently with --local-scheduler.

Let's take a look at the tasks.

class task_param_eval(luigi.Task):
    data = luigi.Parameter()
    C = luigi.FloatParameter()
    gamma = luigi.FloatParameter()

    def requires(self):
        return []
    def output(self):
        return luigi.LocalTarget("temp/%s.txt" % hash( frozenset([self.C, self.gamma]) ))
    def run(self):
        model = svm.SVR(C=self.C, gamma=self.gamma)

        # cross_val_score function returns the "score", not "error". 
        # So, the result is inverse of error value.
        results = -cross_validation.cross_val_score(model, data.data, data.target, scoring="mean_absolute_error")
        with self.output().open("w") as out_file:
            out_file.write( str(np.mean(results)) ); 

The code itself is pretty simple, isn't it? The evaluation value is output by cross-validation using SVR, and the average value is output to a file.

Keep in mind that luigi tasks basically overwrite a 3-piece set of ** [requires, output, run] **.

is. The output file path uses a magic spell called luigi.LocalTarget ().

Also, use luigi.Parameter () etc. as the argument. Inside luigi, I feel like I'm looking at these parameters and deciding that the same task name will be executed if the parameters are different, otherwise the same thing will not be executed twice. (Therefore, Parameter is required to be hashable)

Next, let's look at a task that calls the above task multiple times.

class task_param_tuning(luigi.Task):

    cost_list = luigi.Parameter(default="1,2,5,10")
    gamma_list = luigi.Parameter(default="1,2,5,10")
    
    data = datasets.load_diabetes()

    def requires(self):
        return flatten_array(
            map(lambda C:
                    map(lambda gamma:
                            task_param_eval(data=frozenset(self.data), # values should be hashable 
                                       C=float(C), gamma=float(gamma)),
                        self.cost_list.split(",")),
                self.gamma_list.split(",")))
    def output(self):
        return luigi.LocalTarget("results.csv")
    def run(self):

        results = {}

        for task in self.requires():
            with task.output().open() as taskfile:
                results[(task.C, task.gamma)] = float(taskfile.read())
        
        best_key = min(results,  key=results.get)
        with self.output().open("w") as out_file:
            out_file.write("%s,%s,%.4f\n" %(best_key[0], best_key[1], results[best_key]))

I'm not studying myself, and I don't know how to pass multiple parameters (I'm likely to get angry), so I'm separating them with commas for the time being, but leave that alone. In this code, I wanted to output parameters such as C and gamma of task_param_eval, so I set for task in self.requires () in run, but if it is OK if I can read the require file purely Is self.input () and has the same effect as self.requires (). Output ().

Recommended Posts

Parameter tuning with luigi (2)
Parameter tuning with luigi
Light GBM parameter tuning
Parameter tuning with GridSearchCV / RandomizedSearchCV while using Voting Classifier
Tuning hyperparameters with LightGBM Tuner
Tuning Keras parameters with Keras Tuner
Parameter estimation with Kalman filter
Various Fine Tuning with Mobilenet v2
Controlling test reruns with Luigi + pytest
Data pipeline construction with Python and Luigi
Parameter optimization automation with Keras with GridSearch CV
Tuning hyperparameters with GridSearch using pipeline with keras
Smoother pipeline processing with Luigi! Introducing gokart
I tried CNN fine tuning with Resnet