RFC: Configuration management
Closed this issue · 1 comments
kks32 commented
Configuration Management
Summary
Use a configuration file instead of flags
Motivation
Currently, the code uses flags for configuration, which are defined in train.py. This approach has limitations:
- It's not easily reusable across different scripts
- It's hard to version control configurations
- It doesn't support hierarchical configurations
Design Detail
Implement a configuration system using a library like hydra
or OmegaConf
. This would allow:
- YAML-based configuration files
- Easy overriding of config values from command line
- Hierarchical configurations
- Better version control of configurations
Example:
# Top-level configuration
mode: train
# Data configuration
data:
path: /path/to/your/data
batch_size: 2
noise_std: 6.7e-4
# Model configuration
model:
path: models/
file: null
train_state_file: train_state.pt
# Output configuration
output:
path: rollouts/
filename: rollout
# Training configuration
training:
steps: 20000000
validation_interval: null
save_steps: 5000
learning_rate:
initial: 1e-4
decay: 0.1
decay_steps: 5000000
# Hardware configuration
hardware:
cuda_device_number: null
n_gpus: 1
# Logging configuration
logging:
tensorboard_dir: logs/
constants:
input_sequence_length: 6
num_particle_types: 9
kinematic_particle_id: 3
Drawbacks
Why should we not do this? Please consider the impact on users,
Breaking change on existing workflow.
Rationale and Alternatives
Why is this design the best in the space of possible designs?
- Easier set-up with DesignSafe and other CI tools. Keep track of all configuration files.
What other designs have been considered and what is the rationale for not choosing them?
What is the impact of not doing this?
- Command line args are too long and can be harder to keep track
Unresolved questions
What parts of the design do you expect to resolve through the RFC process before this gets merged?
- Basic configuration (replicate all flags)