databricks/Spark-The-Definitive-Guide

staticSchema error in chapter 3 in s

LeilaGhods opened this issue · 1 comments

I get the following error in the running of the following code from Chapter 3 (Structured Streaming)

in Python

streamingDataFrame = spark.readStream
.schema(staticSchema)
.option("maxFilesPerTrigger", 1)
.format("csv")
.option("header", "true")
.load("/data/retail-data/by-day/*.csv")

NameError: name 'staticSchema' is not defined

NameError Traceback (most recent call last)
in
2 #How many files read together is identified by maxFilesPerTrigger
3 streamingDataFrame = spark.readStream
----> 4 .schema(staticSchema)
5 .option("maxFilesPerTrigger", 1)
6 .format("csv")\

NameError: name 'staticSchema' is not defined

Can anyone guide me about it? I am running the code on Databricks community cluster.

Thanks,

I think you are missing this:

staticDataFrame = spark.read.format("csv")\ 
.option("header", "true")\ 
.option("inferSchema", "true")\ 
.load("/data/retail-data/ by-day/*.csv") 

staticSchema = staticDataFrame.schema

For reference:

Chambers, Bill; Zaharia, Matei. Spark: The Definitive Guide: Big Data Processing Made Simple (p. 63). O'Reilly Media. Kindle Edition.