Bug: Nullability information of overridden field ignored
dan1elt0m opened this issue · 1 comments
dan1elt0m commented
Description
The function create_spark_schema
overlooks the nullability details of an overriden field. This happens because the create_spark_schema
function invokes the _get_spark_type
function using the replaced Spark native type, not the Pydantic type. Unlike the Pydantic type, a Spark native type doesn't hold any nullability data, leading to the _type_to_spark
function consistently returning False for nullability.
Steps to Reproduce
- Define a Pydantic model with an optional field and adding an override.
- Generate schema
- Observe that the nullability is always set to False.
from pydantic import Field
from pyspark.sql.types import LongType
from sparkdantic import SparkModel
class MyModel(SparkModel):
o: int | None = Field(spark_type=LongType)
print(MyModel.model_spark_schema())
Output: StructType([StructField('o', LongType(), False)])
Expected Behavior
Use nullability information of the overridden field.
Output: StructType([StructField('o', LongType(), True)])
Possible Solution
See my PR: #359
mitchelllisle commented
Released in v0.24.1. Thanks for another fix! 🎉