You may write some settings in a YAML file and load it from Python for reference. Especially in the field of machine learning, I have a lot of impressions.
However, the result of loading is a dictionary object, and its key name and hierarchical structure need to be visually confirmed in the YAML file. I couldn't stand the YAML verification work involved during coding.
This article is one of the ways to do this for people who want ** define config in a YAML file and auto-complete that item even during Python implementation ** </ font>. Is introduced.
The code can be found here [https://github.com/Nkriskeeic/configer). This code is modular and can be installed with pip.
First, let's look at an example of writing a setting value in a YAML file.
config.yml
model:
in_channels: 3
n_blocks: 10
block:
channels: 64
activation: relu
out_channels: 1
I think that setting values are often described hierarchically in YAML files as shown above. When you read this from a Python script, it may be written a little differently, but I think it looks like this.
main.py
config = yaml.safe_load('./config.yml')
model = Model(
in_channels = config['model']['in_channels'],
out_channels = config['model']['out_channels'],
n_blocks = config['model']['n_blobks'],
block_channels = config['model']['block']['channels'],
...
)
First of all, I couldn't stand the task of hard-coding this ** dictionary key name **. Second, for the YAML-defined hierarchy, I couldn't stand ** going to the YAML file to remember which values were defined in which hierarchy **.
Ideally, ** If you define a config in a YAML file, the items set in YAML will be auto-completed even during implementation in a Python script **.
main.py[ideal]
config = get_config('./config.yml')
model = Model(
in_channels = config.model.in_channels,
out_channels = config.model.out_channels, # <-Dot access will auto-complete. I want you to check the type
In YACS, the template of the YAML file is defined by the Python script, so I think it is possible to have the YAML contents output by auto-completion in the Python script. But it wasn't complemented by my PyCharm.
yacs
from yacs.config import CfgNode as CN
_C = CN()
_C.MODEL = CN()
_C.MODEL.IN_CHANNELS = 3
def get_cfg_defaults():
return _C.clone()
config = get_cfg_defaults()
config.MODEL.IN_CHANNELS # <--Not auto-completed
So, I decided to create my own framework that ** if you define config in a YAML file, the items set in YAML will be auto-completed even during implementation with a Python script.
First, ** If you can automatically generate a Python file with the same content from a YAML file, you can automatically complete it by referring to the Python file when implementing it ** </ font> I did. This is the image.
config.yml
hoge: piyo
YAML-> Python conversion
config.py
hoge: str = 'piyo'
However, I found it difficult to express the hierarchical structure in YAML on Python with the above simple conversion.
Therefore, I decided to make it a little more complicated and deal with it by ** making all the hierarchical structures into data classes **.
config.yml
model:
in_channels: 3
n_blocks: 10
block:
channels: 64
activation: relu
out_channels: 1
YAML-> Python conversion
config.py
@dataclass
class ModelBlock:
channels: int = 64
activation: str = 'relu'
@dataclass
class Model:
in_channels: int = 3
n_blocks: int = 10
block: ModelBlock = ModelBlock()
out_channels: int = 1
@dataclass
class Config:
model: Model = Model()
Config().model.block.channels # <-All can be complemented
The reason for using the data class is to reduce the amount of description due to the automatic generation of magic methods, and to prevent accidental changes to member variables by setting frozen = True
.
I also try to generate type annotations. This is implemented recursively.
Like this.
With this alone, I just convert the YAML file to a Python script, but since I was able to classify the config, I would like to generate it after adding some convenient methods related to the setting.
It seemed painful to write the generation only by string concatenation, so I decided to use the third-party Python script generation module prestring. did.
I implemented it in Like this.
Generate one config.py
while combining several files
It is an image to do.
config.yml
model:
in_channels: 3
n_blocks: 10
block:
channels: 64
activation: relu
out_channels: 1
YAML-> Python conversion
config.py
@dataclass
class ModelBlock:
channels: int = 64
activation: str = 'relu'
@dataclass
class Model:
in_channels: int = 3
n_blocks: int = 10
block: ModelBlock = ModelBlock()
out_channels: int = 1
@dataclass
class Config:
model: Model = Model()
def some_cool_method():
...
class ConfigGenerator:
def generate():
...
return Config()
The Config
class is responsible for maintaining the contents of the YAML.
The reason for creating the ConfigGenerator
class is that when actually reading the setting value from the YAML file, it was necessary to check whether it was inconsistent with the current Config
class and whether the type was different.
With this, if you define config in the target ** YAML file, the items set in YAML will be automatically completed even during implementation in the Python script (by converting the contents of the YAML file to Python class) ** Things are now possible.
main.py
config = ConfigGenerator().generate()
model = Model(
in_channels = config.model.in_channels,
out_channels = config.model.out_channels, # <-Dot access will auto-complete and you will also see type annotations
However, as a hassle, there is a restriction that ** if you write a YAML file, you have to hit the command to convert it to a python script from the terminal **.
If you don't have to go to the typo or YAML file in 1 second, you're acceptable.
Since I made the config a data class, I thought it would be convenient if I added various class methods by automatic generation.
When the setting value is changed frequently in the experiment, there is a scene such as "The default value setting is described in default.yml
, and some values are updated in ʻexp1.yml". come out. Also, in order to improve visibility, "Write the default value setting in
default.yml, overwrite the model setting with
model.yml, and overwrite the dataset setting with
dataset.yml` ". There are also scenes.
At that time
main.py
config = ConfigGenerator() \
.update_by(['exp1.yml']) \
.generate()
config = ConfigGenerator() \
.update_by(['model.yml', 'dataset.yml']) \
.generate()
I decided to add such a function to Config Generator
because it would be convenient if possible.
However, for the YAML passed by ʻupdate_by`, all you have to do is load it as a dictionary, check the type and variable name, and overwrite the corresponding value.
For the time being, I made it possible to receive multiple YAML files in parallel, but if I try to overwrite the same setting value with a different value in each file, I will give an error.
By displaying the set values in an easy-to-read manner when executing the script, you can prevent unexpected accidents.
I made it displayed like this.
python
config = ConfigGenerator() \
.update_by(['model.yml']) \
.update_by(['exp01.yml']) \
.generate()
config.pprint(wait_yes=True) # <-Prevent code from executing unless you check the display and press YES
Output result
default from /config/default.yml model: in_channels: 3 n_blocks: 20 (default 10, changed by /config/model.yml) block: channels: 32 (default 64, changed by /config/exp01.yml) activation: relu out_channels: 1
I tried to issue a warning message when updating from the default value with some file. This allows you to immediately notice when you run the code with a setting that is different from the planned setting.
Recommended Posts