Generation not following the constraints when validator is specified
HarshTrivedi opened this issue · 10 comments
Description
The generator is not following the constraints when a custom validator is specified.
URL to code causing the issue
No response
MCVE
from pydantic import validator
from polyfactory.factories.pydantic_factory import ModelFactory
class Product(BaseModel):
price: float = Field(Required, description="Price of the product in dollars (> 0).", gt=0)
@validator("price")
def process_price(cls, value): return round(value, 2)
class ProductFactory(ModelFactory[Product]):
__model__ = Product
for e in range(1000):
Product(**ProductFactory.build().dict())
Running this code results in:
pydantic.error_wrappers.ValidationError: 1 validation error for Product
price
ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
Steps to reproduce
Just run the given code.
Screenshots
No response
Logs
No response
Litestar Version
Not sure what Litestar is.
Platform
- Linux
- Mac
- Windows
- Other (Please specify in the description above)
By constraint you mean the gt value specified in field or the logic in validator?
@Goldziher I can attest that it happens for the gt
values specified.
EDIT: Note that I am using Pydantic version 2, the OP may only refer to Pyndatic version 1.
EDIT2: I was only referring to the gt
metadata not being picked up from Field(gt=0)
, but I tried a minimal reproducible example on my side and it was working, so disregard my comments.
This is because field_info
somehow loses the values of metadata
inside PydanticFieldMeta.from_field_info
despite having it in cls.__model__.model_fields.items()
in the ModelFactory.get_model_fields
classmethod.
cls._fields_metadata = [
PydanticFieldMeta.from_field_info(
field_info=field_info,
field_name=field_name,
random=cls.__random__,
use_alias=not cls.__model__.model_config.get("populate_by_name", False),
randomize_collection_length=cls.__randomize_collection_length__,
min_collection_length=cls.__min_collection_length__,
max_collection_length=cls.__max_collection_length__,
)
for field_name, field_info in cls.__model__.model_fields.items()
]
@Goldziher I can attest that it happens for the
gt
values specified.This is because
field_info
somehow loses the values ofmetadata
insidePydanticFieldMeta.from_field_info
despite having it incls.__model__.model_fields.items()
in theModelFactory.get_model_fields
classmethod.cls._fields_metadata = [ PydanticFieldMeta.from_field_info( field_info=field_info, field_name=field_name, random=cls.__random__, use_alias=not cls.__model__.model_config.get("populate_by_name", False), randomize_collection_length=cls.__randomize_collection_length__, min_collection_length=cls.__min_collection_length__, max_collection_length=cls.__max_collection_length__, ) for field_name, field_info in cls.__model__.model_fields.items() ]
Thank you. PRS are welcome..
@litestar-org/members and @litestar-org/maintainers . Any volunteers?
Aside from gt
etc constraints, does polyfactory take into account @validator
at all? I looked at it briefly but didn't find any (obvious) reference. Also can't tell how the field_info
, _fields_metadata
etc mentioned above are related to this.
Aside from
gt
etc constraints, does polyfactory take into account@validator
at all? I looked at it briefly but didn't find any (obvious) reference. Also can't tell how thefield_info
,_fields_metadata
etc mentioned above are related to this.
I don't know whats happening there. We don't do anything with valudators. Maybe pydantic modifying some values.
What's happening is that first polyfactory creates correctly a value greater than zero, but sometimes that's close to zero say 0.00234. Then the validator above returns round(0.00234, 2) == 0
and when passed back to the constructor it fails the constraint validation. If polyfactory doesn't do anything with validators, that explains the issue.
Wow, great catch !
So basically the issue is that the pydantic validator is either imprecise or it modifies the input?
Not sure what you mean by imprecise but yes Pydantic (ab)uses the term "validator" to include parsing of input data:
validators allow complex data schemas to be clearly and easily defined, validated, and parsed.
Calling something like round(value, 2
) a "validator" or even "parser" seems like a stretch, it's more like a general value transformer. Terminology aside, the issue here is more subtle:
- The originally posted MCVE is not a polyfactory bug or missing feature; the test is wrong. What the test does is equivalent to the following:
# this works:
# (a) the input is greater than 0
# (b) the output is rounded to 2 decimals
p = Product(price=0.000234)
assert p.price == 0.0
# This fails: the rounded value is not greater than 0
Product(**p.dict())
Basically the specific model definition does not roundtrip: the usually expected deserialize(serialize(model)) == model
invariant does not hold. This may (or may not) be ok but it has nothing to do with polyfactory.
- By default the
@validator
decorator applies after regular field validation (e.g. thegt
check in the MCVE). This can change by passingpre=True
to cause the validator to be called prior to other validation, for example:
@validator("price", pre=True)
def process_price(cls, value):
return round(value, 2)
Now Product(price=0.000234)
is invalid so in this case polyfactory should not have generated this value for price
.
- Even without
pre=True
, a validator can always raise an error:
validators should either return the parsed value or raise a ValueError, TypeError, or AssertionError (assert statements may be used).
For example say the price validator was like this:
@validator("price")
def process_price(cls, value):
value = round(value, 2)
assert value > 0
return value
Just like in case (2), Product(price=0.000234)
is invalid and polyfactory should not have generated this value for price
.
Closing this since supporting custom validators wouldn't be possible. @HarshTrivedi, if you meant something else, please feel free to reopen the issue.