Is there a way to customize hydra run dir and sweep subdir based on values of a sweep? #2644

nathimel · 2023-03-17T04:50:07Z

nathimel
Mar 17, 2023

(This question was originally posted on Stack Overflow.)

I am using hydra to organize my configurations and output folders for a project. This project involves first specifying a particular objective that can differ in a few parameters. Then that objective can be optimized by different kinds of optimization procedures, where each optimization has its own specific hyperparameters.

So far, it has been straightforward to use hydra's multisweep functionality to explore different objectives, for a single kind of optimization. I have been using string interpolation in my main config.yaml file. Suppose objective and experiment are config groups, and optimization is a config group under experiment.

filepaths: 
  # filenames
  objective_data: objective_data.csv # contains intermediate output useful for all experiments
  experiment_data: experiment_output.csv # contains final output specific only to a single experiment configuration
  # important subdirectories to keep distinct
  objective_subdir: param1=${objective.feature1.param1}/param2=${objective.feature1.param2}
  experiment_subdir: optimization=${experiment.optimization.name}/num_trials=${experiment.num_trials}/learning_rate=${experiment.optimization.learning_rate}/seed=${seed}
  leaf_subdir: ${filepaths.objective_subdir}/${filepaths.experiment_subdir}

hydra:
  run:
    dir: outputs/${filepaths.leaf_subdir}
  job:
    chdir: True
    config:
      override_dirname:
        exclude_keys:
          - filepaths.leaf_subdir
          - objective.feature1.param1
          - objective.feature1.param2
          - experiment.optimization.name
          - experiment.num_trials
          - experiment.optimization.learning_rate
          - seed
  sweep:
    dir: multirun
    subdir: ${filepaths.leaf_subdir}

And my conf directory structure is the following:

conf
├── config.yaml
├── objective
│   ├── basic.yaml
│   └── feature1
│       └── params.yaml
└── experiment
    ├── basic.yaml
    └── optimization
        ├── reinforcement_learning.yaml
        └── replicator_dynamic.yaml

When I use multisweep, I can get the following directory structure:

multirun
├── multirun.yaml
└── param1=10
    └── param2=10
        ├── objective_data.csv
        ├── optimization=reinforcement_learning
        │   └── num_trials=100
        │        └── learning_rate=1e-4
        │            └── rd_hparam=5
        │               └── seed=42
        │                   └── output.csv
        │ 
        └── optimization=replicator_dynamic
            └── num_trials=1
                 └── learning_rate=1e-4
                     └── rd_hparam=5
                        └── seed=42
                            └── output.csv

But really, I don't want to have to branch with rd_hparam in the reinforcement_learning optimization, or to have to branch with learning_rate subfolder in the replicator_dynamic optimization, because those values have nothing to do with those configurations. This is because learning_rate is a hyperparameter specific to reinforcement_learning and rd_param is a parameter specific to reinforcement_learning.

However, it would be useful to keep other keys excluded from the override_dirname, like seed, (like is described here) so that they could be the later parents of my job-specific output.csv files

But I really want to use multisweep to run the different optimization jobs all at once. Ideally, I want something like the following directory structure:

multirun
├── multirun.yaml
└── param1=10
    └── param2=10
        ├── objective_data.csv
        ├── optimization=reinforcement_learning
        │   └── num_trials=100
        │        └── learning_rate=1e-4
        │            └── seed=42
        │                 └── output.csv
        │ 
        └── optimization=replicator_dynamic
            └── num_trials=1
                 └── rd_hparam=5
                      └── seed=42
                          └── output.csv

I've mainly been looking into ways of combining override_dirname as explained in the hydra tutorial here together with string interpolation. But this only gets me so far -- what I really need is to be able to specify a condition for what the job-specific subdirectory is, based on the values I'm sweeping in multisweep.

The main reason I'm worried that what I want to do is not possible is that it appears that I want to change fields of hydra.run and hydra.sweep, which are populated at runtime, according to the docs. I know that I can access them, but can I change their fields? If I could edit these fields somehow, then I would probably just implement all of my folder logic in code, instead of trying to do sophisticated string interpolation.

Less relevant, things that I've looked into:

I initially had some suspicion that what I should use is some kind of custom resolver from OmegaConf, because then perhaps I could pass the config object or fields of it to a custom-built resolver function that has precisely the logic I want. But whether and how to do this is not clear to me.

Finally, I've been avoiding trying to edit paths from hydra.utils.get_original_cwd by hand, because hydra seems to have a lot of support for automatic subdirectory creation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there a way to customize hydra run dir and sweep subdir based on values of a sweep? #2644

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Is there a way to customize hydra run dir and sweep subdir based on values of a sweep? #2644

Uh oh!

nathimel Mar 17, 2023

Replies: 0 comments

nathimel
Mar 17, 2023