staticSchema error in chapter 3 in s
LeilaGhods opened this issue · 1 comments
I get the following error in the running of the following code from Chapter 3 (Structured Streaming)
in Python
streamingDataFrame = spark.readStream
.schema(staticSchema)
.option("maxFilesPerTrigger", 1)
.format("csv")
.option("header", "true")
.load("/data/retail-data/by-day/*.csv")
NameError: name 'staticSchema' is not defined
NameError Traceback (most recent call last)
in
2 #How many files read together is identified by maxFilesPerTrigger
3 streamingDataFrame = spark.readStream
----> 4 .schema(staticSchema)
5 .option("maxFilesPerTrigger", 1)
6 .format("csv")\
NameError: name 'staticSchema' is not defined
Can anyone guide me about it? I am running the code on Databricks community cluster.
Thanks,
I think you are missing this:
staticDataFrame = spark.read.format("csv")\
.option("header", "true")\
.option("inferSchema", "true")\
.load("/data/retail-data/ by-day/*.csv")
staticSchema = staticDataFrame.schema
For reference:
Chambers, Bill; Zaharia, Matei. Spark: The Definitive Guide: Big Data Processing Made Simple (p. 63). O'Reilly Media. Kindle Edition.