Nixtla/statsforecast

[Spark, cross-validation] Issue with ":" as model aliases using cross_validation in Spark

Closed this issue · 1 comments

What happened + What you expected to happen

Using spark with certain models with aliases containing : returns an SchemaError exception.
Using WindowAverage(window_size =10, alias='WindowAverage_season_11') is fine, but using WindowAverage(window_size =10, alias='WindowAverage:season_11') (notice the : before season) yields an error.

Versions / Dependencies

Databricks==13.3 LTS
Apache Spark==3.4.1
statsforecast==1.7.5
python==3.10.12

Reproduction script

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
sdf = spark.createDataFrame(pd.DataFrame({'unique_id': 10*['id_1'],'ds': pd.date_range(start='2023-01-01', periods=10, freq='D'),'y': np.random.randint(1, 100, size=10)}))

sf = StatsForecast(
models=[WindowAverage(window_size =11), WindowAverage(window_size =10, alias='WindowAverage_season_11')],
freq='W',
n_jobs=1
)
cross_validation_df_1 = sf.cross_validation(
df=sdf,
h=1,
step_size=1,
n_windows=1
)
cross_validation_df_1.show(5)

Issue Severity

None

Hey. This restriction comes from spark, you should only use letters, numbers and underscores for column names (ref).