Auto-complete YAML content with Python

Summary of this article

You may write some settings in a YAML file and load it from Python for reference. Especially in the field of machine learning, I have a lot of impressions.

However, the result of loading is a dictionary object, and its key name and hierarchical structure need to be visually confirmed in the YAML file. I couldn't stand the YAML verification work involved during coding.

This article is one of the ways to do this for people who want ** define config in a YAML file and auto-complete that item even during Python implementation ** </ font>. Is introduced.

The code can be found here [https://github.com/Nkriskeeic/configer). This code is modular and can be installed with pip.

Describe the setting value in the YAML file

First, let's look at an example of writing a setting value in a YAML file.

config.yml


model:
    in_channels: 3
    n_blocks: 10
    block:
        channels: 64
        activation: relu
    out_channels: 1

I think that setting values are often described hierarchically in YAML files as shown above. When you read this from a Python script, it may be written a little differently, but I think it looks like this.

main.py


config = yaml.safe_load('./config.yml')
model = Model(
    in_channels = config['model']['in_channels'],
    out_channels = config['model']['out_channels'],
    n_blocks = config['model']['n_blobks'],
    block_channels = config['model']['block']['channels'],
    ...
)

First of all, I couldn't stand the task of hard-coding this ** dictionary key name **. Second, for the YAML-defined hierarchy, I couldn't stand ** going to the YAML file to remember which values were defined in which hierarchy **.

Ideally, ** If you define a config in a YAML file, the items set in YAML will be auto-completed even during implementation in a Python script **.

main.py[ideal]


config = get_config('./config.yml')
model = Model(
    in_channels = config.model.in_channels,
    out_channels = config.model.out_channels, # <-Dot access will auto-complete. I want you to check the type

In YACS, the template of the YAML file is defined by the Python script, so I think it is possible to have the YAML contents output by auto-completion in the Python script. But it wasn't complemented by my PyCharm.

yacs


from yacs.config import CfgNode as CN

_C = CN()

_C.MODEL = CN()
_C.MODEL.IN_CHANNELS = 3

def get_cfg_defaults():
  return _C.clone()

config = get_cfg_defaults()
config.MODEL.IN_CHANNELS # <--Not auto-completed

So, I decided to create my own framework that ** if you define config in a YAML file, the items set in YAML will be auto-completed even during implementation with a Python script.

Generate the corresponding Python script from a YAML file

First, ** If you can automatically generate a Python file with the same content from a YAML file, you can automatically complete it by referring to the Python file when implementing it ** </ font> I did. This is the image.

config.yml


hoge: piyo

YAML-> Python conversion

config.py


hoge: str = 'piyo'

However, I found it difficult to express the hierarchical structure in YAML on Python with the above simple conversion.

Therefore, I decided to make it a little more complicated and deal with it by ** making all the hierarchical structures into data classes **.

config.yml


model:
    in_channels: 3
    n_blocks: 10
    block:
        channels: 64
        activation: relu
    out_channels: 1

YAML-> Python conversion

config.py


@dataclass
class ModelBlock:
    channels: int = 64
    activation: str = 'relu'

@dataclass
class Model:
    in_channels: int = 3
    n_blocks: int = 10
    block: ModelBlock = ModelBlock()
    out_channels: int = 1

@dataclass
class Config:
    model: Model = Model()

Config().model.block.channels  # <-All can be complemented

The reason for using the data class is to reduce the amount of description due to the automatic generation of magic methods, and to prevent accidental changes to member variables by setting frozen = True. I also try to generate type annotations. This is implemented recursively. Like this.

With this alone, I just convert the YAML file to a Python script, but since I was able to classify the config, I would like to generate it after adding some convenient methods related to the setting.

It seemed painful to write the generation only by string concatenation, so I decided to use the third-party Python script generation module prestring. did.

I implemented it in Like this. Generate one config.py while combining several files It is an image to do.

config.yml


model:
    in_channels: 3
    n_blocks: 10
    block:
        channels: 64
        activation: relu
    out_channels: 1

YAML-> Python conversion

config.py


@dataclass
class ModelBlock:
    channels: int = 64
    activation: str = 'relu'

@dataclass
class Model:
    in_channels: int = 3
    n_blocks: int = 10
    block: ModelBlock = ModelBlock()
    out_channels: int = 1

@dataclass
class Config:
    model: Model = Model()

    def some_cool_method():
        ...

class ConfigGenerator:
    def generate():
        ...
        return Config()

The Config class is responsible for maintaining the contents of the YAML. The reason for creating the ConfigGenerator class is that when actually reading the setting value from the YAML file, it was necessary to check whether it was inconsistent with the current Config class and whether the type was different.

With this, if you define config in the target ** YAML file, the items set in YAML will be automatically completed even during implementation in the Python script (by converting the contents of the YAML file to Python class) ** Things are now possible.

main.py


config = ConfigGenerator().generate()
model = Model(
    in_channels = config.model.in_channels,
    out_channels = config.model.out_channels, # <-Dot access will auto-complete and you will also see type annotations

However, as a hassle, there is a restriction that ** if you write a YAML file, you have to hit the command to convert it to a python script from the terminal **.

If you don't have to go to the typo or YAML file in 1 second, you're acceptable.

Additional features

Since I made the config a data class, I thought it would be convenient if I added various class methods by automatic generation.

Overwrite function of setting value

When the setting value is changed frequently in the experiment, there is a scene such as "The default value setting is described in default.yml, and some values are updated in ʻexp1.yml". come out. Also, in order to improve visibility, "Write the default value setting in default.yml, overwrite the model setting with model.yml, and overwrite the dataset setting with dataset.yml` ". There are also scenes.

At that time

main.py


config = ConfigGenerator() \
    .update_by(['exp1.yml']) \
    .generate()
config = ConfigGenerator() \
    .update_by(['model.yml', 'dataset.yml']) \
    .generate()

I decided to add such a function to Config Generator because it would be convenient if possible. However, for the YAML passed by ʻupdate_by`, all you have to do is load it as a dictionary, check the type and variable name, and overwrite the corresponding value.

For the time being, I made it possible to receive multiple YAML files in parallel, but if I try to overwrite the same setting value with a different value in each file, I will give an error.

pprint function

By displaying the set values in an easy-to-read manner when executing the script, you can prevent unexpected accidents.

I made it displayed like this.

python


config = ConfigGenerator() \
    .update_by(['model.yml']) \
    .update_by(['exp01.yml']) \
    .generate()

config.pprint(wait_yes=True)  # <-Prevent code from executing unless you check the display and press YES

Output result

default from /config/default.yml model:     in_channels: 3     n_blocks: 20 (default 10, changed by /config/model.yml)     block:         channels: 32 (default 64, changed by /config/exp01.yml)         activation: relu     out_channels: 1

I tried to issue a warning message when updating from the default value with some file. This allows you to immediately notice when you run the code with a setting that is different from the planned setting.

Recommended Posts