Chapter_3_sort_code_missing
adigiosaffatte opened this issue · 2 comments
adigiosaffatte commented
When introducing the streaming DataFrame, the following code:
from pyspark.sql.functions import window, column, desc, col
staticDataFrame\
.selectExpr( # variante di select che accetta espressioni SQL
"CustomerId",
"(UnitPrice * Quantity) as total_cost",
"InvoiceDate") \
.groupBy(
col("CustomerId"), window(col("InvoiceDate"), "1 day")) \
.sum("total_cost") \
.show(5)
produces an output different from the one shown in the chapter, because it misses a "sorting line".
I think the correct code should be:
from pyspark.sql.functions import window, column, desc, col
staticDataFrame\
.selectExpr( # variante di select che accetta espressioni SQL
"CustomerId",
"(UnitPrice * Quantity) as total_cost",
"InvoiceDate") \
.groupBy(
col("CustomerId"), window(col("InvoiceDate"), "1 day")) \
.sum("total_cost") \
.sort(desc("sum(total_cost)")) \
.show(5)
bllchmbrs commented
go ahead and make a pull request to fix this please
adigiosaffatte commented
just made the pull request