YAML roundtrip not working correctly
petosa opened this issue · 5 comments
Hello, I am using hydra-zen to save config yaml files, ship them over to another system, and then load them back in and instantiate them. However, load_from_yaml
is not working as expected - it loads back in a DictConfig
rather than a hydrazen type
and thus none of the type validation + coercion works when instantiating the loaded yaml. This is particularly an issue for enum types.
Repro:
import omegaconf
from omegaconf import MISSING, OmegaConf
import dataclasses
from enum import Enum
from hydra_zen import instantiate, builds, save_as_yaml, load_from_yaml, to_yaml
class Color(Enum):
red = "RED"
blue = "BLUE"
@dataclasses.dataclass
class Example:
color: Color = MISSING
number: int = MISSING
built = builds(Example, color="blue", number="3")
print(type(built))
print(built)
print(instantiate(built))
save_as_yaml(built, "saved.yaml")
loaded = load_from_yaml("saved.yaml")
print()
print(type(loaded))
print(loaded)
print(instantiate(loaded))
The first 3 prints make sense:
<class 'type'>
<class 'types.Builds_Example'>
Example(color=<Color.blue: 'BLUE'>, number=3)
Proper type validation and coercion is done. When the yaml is saved and loaded back in however, type validation + coercion does not happen, since load_from_yaml
returns a DictConfig
rather than a type
. The results of instantiate
do not match. From the last 3 prints:
<class 'omegaconf.dictconfig.DictConfig'>
{'_target_': '__main__.Example', 'color': 'blue', 'number': 3}
Example(color='blue', number=3)
As you can see, Example
is instantiated with the string blue
rather than the enum <Color.blue: 'BLUE'>
.
Thanks for taking the time to write this up, and for the nice repro.
The issue that you are hitting here is unfortunate... While I think this is more a Hydra/OmegaConf issue, I would certainly like to help hydra-zen users avoid this issue.
I think this quick solution may suit your needs and may work quite generally:
def load_object_from_yaml(x: str):
loaded = load_from_yaml("saved.yaml")
return OmegaConf.to_object(OmegaConf.create(instantiate(loaded)))
Here it is in action for your example:
from omegaconf import MISSING, OmegaConf
import dataclasses
from enum import Enum
from hydra_zen import instantiate, builds, save_as_yaml, load_from_yaml, to_yaml, just
def load_object_from_yaml(x: str):
loaded = load_from_yaml("saved.yaml")
return OmegaConf.to_object(OmegaConf.create(instantiate(loaded)))
def pp(x):
print(x, "::", type(x))
class Color(Enum):
red = "RED"
blue = "BLUE"
@dataclasses.dataclass
class Example:
color: Color = MISSING
number: int = MISSING
built = builds(Example, color="blue", number="3")
pp(instantiate(built))
save_as_yaml(built, "saved.yaml")
loaded = load_object_from_yaml("saved.yaml")
print()
pp(loaded)
it prints:
Example(color=<Color.blue: 'BLUE'>, number=3) :: <class '__main__.Example'>
Example(color=<Color.blue: 'BLUE'>, number=3) :: <class '__main__.Example'>
instantiate(loaded))
returnsExample('blue', number=3)
. This is where Hydra might change its behavior and see that the returned dataclass has an enum-type field that had not been converted.OmegaConf.create(...)
returns the omegaconf DictConfig-type{'color': <Color.blue: 'BLUE'>, 'number': 3}
. Here, omegaconf leverages the annotation inExample
to produce the appropriate enum value.OmegaConf.to_object(...)
converts the DictConfig into the original backing dataclass.
load_from_yaml
is currently just an alias for OmegaConf.load
, which does not perform any instantiation. That being said, I think it would be reasonable for hydra-zen to add a load_from_yaml(..., to_object: bool)
option, which would effectively do these extra steps for the user.
@petosa let me know if the above solution works for you in general and if the proposed enhancement to load_from_yaml
seems sufficient.
@Jasha10 perhaps you have a better solution.
@rsokl thank you for your fast reply and suggestion. After playing around with your solution for a while, I noticed some shortcomings. In general, these are due to differences in functionality between omegaconf and hydrazen (the latter is more robust than the former).
Here are 3 specific situations where the proposed solution does not work (can attach some code snippets later if that'd help).
- enum type coercion does not work if
Example
is a function instead of a dataclass. - Omegaconf runs into a validation error if
Example
is a normal object (not a dataclass) with an__init__
function. - Omegaconf runs into a
MissingMandatoryValue
error if it loads a yaml with missing fields (hydrazen handles missing values gracefully).
I think the right thing is to have load_from_yaml
parse the yaml file into a hydrazen build type
. That way the call to instantiate
will benefit from robust hydrazen instantiation without having to fall back on Omegaconf.
Thanks so much for looking further into this, @petosa . I will investigate those failcases and will try to devise a more robust solution.
One workaround, which does not rely on OmegaConf's implicit conversion of str
to Enum
, is to call builds(Color, value="BLUE")
so as to create an object whose _target_
points to Color
:
import dataclasses
from enum import Enum
from hydra_zen import builds, instantiate, load_from_yaml, save_as_yaml
from omegaconf import MISSING
def pp(x):
print(x, "::", type(x))
class Color(Enum):
red = "RED"
blue = "BLUE"
@dataclasses.dataclass
class Example:
color: Color = MISSING
number: int = MISSING
built = builds(Example, color=builds(Color, value="BLUE"), number="3")
pp(instantiate(built))
save_as_yaml(built, "saved.yaml")
loaded = load_from_yaml("saved.yaml")
pp(instantiate(loaded))
print()
pp(loaded)
$ python script.py
Example(color=<Color.blue: 'BLUE'>, number=3) :: <class '__main__.Example'>
Example(color=<Color.blue: 'BLUE'>, number=3) :: <class '__main__.Example'>
{'_target_': '__main__.Example', 'color': {'_target_': '__main__.Color', 'value': 'BLUE'}, 'number': 3} :: <class 'omegaconf.dictconfig.DictConfig'>
Here are 3 specific situations where the proposed solution does not work (can attach some code snippets later if that'd help).
I believe my proposal above should work around the 3 situations you mentioned, @petosa.