/pydantic-spark

Primary LanguagePythonMIT LicenseMIT

Python package codecov PyPI version CodeQL

pydantic-spark

This library can convert a pydantic class to a spark schema or generate python code from a spark schema.

Install

pip install pydantic-spark

Pydantic class to spark schema

import json
from typing import Optional

from pydantic_spark.base import SparkBase

class TestModel(SparkBase):
    key1: str
    key2: int
    key2: Optional[str]

schema_dict: dict = TestModel.spark_schema()
print(json.dumps(schema_dict))

Coerce type

Pydantic-spark provides a coerce_type option that allows type coercion. When applied to a field, pydantic-spark converts the column's data type to the specified coercion type.

import json
from pydantic import Field
from pydantic_spark.base import SparkBase, CoerceType

class TestModel(SparkBase):
    key1: str = Field(extra_json_schema={"coerce_type": CoerceType.integer})

schema_dict: dict = TestModel.spark_schema()
print(json.dumps(schema_dict))

Install for developers

Install package
  • Requirement: Poetry 1.*
poetry install
Run unit tests
pytest
coverage run -m pytest  # with coverage
# or (depends on your local env) 
poetry run pytest
poetry run coverage run -m pytest  # with coverage
Run linting

The linting is checked in the github workflow. To fix and review issues run this:

black .   # Auto fix all issues
isort .   # Auto fix all issues
pflake .  # Only display issues, fixing is manual