RFC: Configuration management

Question

RFC: Configuration management

Closed this issue 4 months ago · 1 comments

kks32 commented 4 months ago

Configuration Management

Summary

Use a configuration file instead of flags

Motivation

Currently, the code uses flags for configuration, which are defined in train.py. This approach has limitations:

It's not easily reusable across different scripts
It's hard to version control configurations
It doesn't support hierarchical configurations

Design Detail

Implement a configuration system using a library like hydra or OmegaConf. This would allow:

YAML-based configuration files
Easy overriding of config values from command line
Hierarchical configurations
Better version control of configurations

Example:

# Top-level configuration
mode: train

# Data configuration
data:
  path: /path/to/your/data
  batch_size: 2
  noise_std: 6.7e-4

# Model configuration
model:
  path: models/
  file: null
  train_state_file: train_state.pt

# Output configuration
output:
  path: rollouts/
  filename: rollout

# Training configuration
training:
  steps: 20000000
  validation_interval: null
  save_steps: 5000
  learning_rate:
    initial: 1e-4
    decay: 0.1
    decay_steps: 5000000

# Hardware configuration
hardware:
  cuda_device_number: null
  n_gpus: 1

# Logging configuration
logging:
  tensorboard_dir: logs/

constants:
  input_sequence_length: 6
  num_particle_types: 9
  kinematic_particle_id: 3

Drawbacks

Why should we not do this? Please consider the impact on users,
Breaking change on existing workflow.

Rationale and Alternatives

Why is this design the best in the space of possible designs?

Easier set-up with DesignSafe and other CI tools. Keep track of all configuration files.

What other designs have been considered and what is the rationale for not choosing them?

What is the impact of not doing this?

Command line args are too long and can be harder to keep track

Unresolved questions

What parts of the design do you expect to resolve through the RFC process before this gets merged?

Basic configuration (replicate all flags)

Changelog

Answer 1 · 2024-06-28T12:53:34.000Z

Fixed in #81