mitchelllisle/sparkdantic

Bug: Nullability information of overridden field ignored

dan1elt0m opened this issue · 1 comments

Description

The function create_spark_schema overlooks the nullability details of an overriden field. This happens because the create_spark_schema function invokes the _get_spark_type function using the replaced Spark native type, not the Pydantic type. Unlike the Pydantic type, a Spark native type doesn't hold any nullability data, leading to the _type_to_spark function consistently returning False for nullability.

Steps to Reproduce

  1. Define a Pydantic model with an optional field and adding an override.
  2. Generate schema
  3. Observe that the nullability is always set to False.
from pydantic import Field
from pyspark.sql.types import LongType
from sparkdantic import SparkModel

class MyModel(SparkModel):
    o: int | None = Field(spark_type=LongType)


print(MyModel.model_spark_schema())
    

Output: StructType([StructField('o', LongType(), False)])

Expected Behavior

Use nullability information of the overridden field.
Output: StructType([StructField('o', LongType(), True)])

Possible Solution

See my PR: #359

Released in v0.24.1. Thanks for another fix! 🎉