Generation not following the constraints when validator is specified

Question

Generation not following the constraints when validator is specified

HarshTrivedi opened this issue a year ago · 10 comments

HarshTrivedi commented a year ago

Description

The generator is not following the constraints when a custom validator is specified.

URL to code causing the issue

No response

MCVE

from pydantic import validator
from polyfactory.factories.pydantic_factory import ModelFactory

class Product(BaseModel):
    price: float = Field(Required, description="Price of the product in dollars (> 0).", gt=0)

    @validator("price")
    def process_price(cls, value): return round(value, 2)

class ProductFactory(ModelFactory[Product]):
    __model__ = Product

for e in range(1000):
    Product(**ProductFactory.build().dict())

Running this code results in:

pydantic.error_wrappers.ValidationError: 1 validation error for Product
price
  ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

Steps to reproduce

Just run the given code.

Screenshots

No response

Logs

No response

Litestar Version

Not sure what Litestar is.

Platform

Linux
Mac
Windows
Other (Please specify in the description above)

Answer 1 · 2023-06-10T10:48:00.000Z

By constraint you mean the gt value specified in field or the logic in validator?

Answer 2 · 2023-08-01T07:19:03.000Z

@Goldziher I can attest that it happens for the gt values specified.

EDIT: Note that I am using Pydantic version 2, the OP may only refer to Pyndatic version 1.

EDIT2: I was only referring to the gt metadata not being picked up from Field(gt=0) , but I tried a minimal reproducible example on my side and it was working, so disregard my comments.

This is because field_info somehow loses the values of metadata inside PydanticFieldMeta.from_field_info despite having it in cls.__model__.model_fields.items() in the ModelFactory.get_model_fields classmethod.

                cls._fields_metadata = [
                    PydanticFieldMeta.from_field_info(
                        field_info=field_info,
                        field_name=field_name,
                        random=cls.__random__,
                        use_alias=not cls.__model__.model_config.get("populate_by_name", False),
                        randomize_collection_length=cls.__randomize_collection_length__,
                        min_collection_length=cls.__min_collection_length__,
                        max_collection_length=cls.__max_collection_length__,
                    )
                    for field_name, field_info in cls.__model__.model_fields.items()
                ]

Answer 3 · 2023-08-01T07:39:45.000Z

@Goldziher I can attest that it happens for the gt values specified.

This is because field_info somehow loses the values of metadata inside PydanticFieldMeta.from_field_info despite having it in cls.__model__.model_fields.items() in the ModelFactory.get_model_fields classmethod.

                cls._fields_metadata = [
                    PydanticFieldMeta.from_field_info(
                        field_info=field_info,
                        field_name=field_name,
                        random=cls.__random__,
                        use_alias=not cls.__model__.model_config.get("populate_by_name", False),
                        randomize_collection_length=cls.__randomize_collection_length__,
                        min_collection_length=cls.__min_collection_length__,
                        max_collection_length=cls.__max_collection_length__,
                    )
                    for field_name, field_info in cls.__model__.model_fields.items()
                ]

Thank you. PRS are welcome..

Answer 4 · 2023-08-01T11:30:43.000Z

@litestar-org/members and @litestar-org/maintainers . Any volunteers?

Answer 5 · 2023-08-02T17:29:35.000Z

Aside from gt etc constraints, does polyfactory take into account @validator at all? I looked at it briefly but didn't find any (obvious) reference. Also can't tell how the field_info, _fields_metadata etc mentioned above are related to this.

Answer 6 · 2023-08-02T18:00:15.000Z

Aside from gt etc constraints, does polyfactory take into account @validator at all? I looked at it briefly but didn't find any (obvious) reference. Also can't tell how the field_info, _fields_metadata etc mentioned above are related to this.

I don't know whats happening there. We don't do anything with valudators. Maybe pydantic modifying some values.

Answer 7 · 2023-08-02T19:24:49.000Z

What's happening is that first polyfactory creates correctly a value greater than zero, but sometimes that's close to zero say 0.00234. Then the validator above returns round(0.00234, 2) == 0 and when passed back to the constructor it fails the constraint validation. If polyfactory doesn't do anything with validators, that explains the issue.

Answer 8 · 2023-08-02T19:49:08.000Z

Wow, great catch !

So basically the issue is that the pydantic validator is either imprecise or it modifies the input?

Answer 9 · 2023-08-03T05:42:05.000Z

Not sure what you mean by imprecise but yes Pydantic (ab)uses the term "validator" to include parsing of input data:

validators allow complex data schemas to be clearly and easily defined, validated, and parsed.

Calling something like round(value, 2) a "validator" or even "parser" seems like a stretch, it's more like a general value transformer. Terminology aside, the issue here is more subtle:

The originally posted MCVE is not a polyfactory bug or missing feature; the test is wrong. What the test does is equivalent to the following:

    # this works: 
    # (a) the input is greater than 0
    # (b) the output is rounded to 2 decimals
    p = Product(price=0.000234)
    assert p.price == 0.0
    
    # This fails: the rounded value is not greater than 0
    Product(**p.dict())

Basically the specific model definition does not roundtrip: the usually expected deserialize(serialize(model)) == model invariant does not hold. This may (or may not) be ok but it has nothing to do with polyfactory.

By default the @validator decorator applies after regular field validation (e.g. the gt check in the MCVE). This can change by passing pre=True to cause the validator to be called prior to other validation, for example:

    @validator("price", pre=True)
    def process_price(cls, value): 
        return round(value, 2)

Now Product(price=0.000234) is invalid so in this case polyfactory should not have generated this value for price.

Even without pre=True, a validator can always raise an error:

validators should either return the parsed value or raise a ValueError, TypeError, or AssertionError (assert statements may be used).

For example say the price validator was like this:

    @validator("price")
    def process_price(cls, value): 
        value = round(value, 2)
        assert value > 0
        return value

Just like in case (2), Product(price=0.000234) is invalid and polyfactory should not have generated this value for price.

Answer 10 · 2023-09-18T13:23:39.000Z

Closing this since supporting custom validators wouldn't be possible. @HarshTrivedi, if you meant something else, please feel free to reopen the issue.