databricks/Spark-The-Definitive-Guide

Structured Streaming

AsTheSeaRises opened this issue · 1 comments

Hi - I am following the structured streaming example from the console (using pyspark). I successfully read the JSON files (which I load from S3) and set the files per trigger. However once I start the 'activityQuery' using code below, I can't get back into the shell as it just runs continously. So I cannot execute the 'activityQuery.awaitTermination' command or the spark.sql select command to see the activity_counts table. If I try to create another shell and run pyspark, none of the tables are available or visible.

activityQuery = activityCounts.writeStream.queryName("activity_counts")\
.format("memory").outputMode("complete")\
.start()

This isn't really that much to go on... This has worked for the hundreds of folks that have gone through this book so I have to chalk this up to user error. I would recommend starting over again on this particular chapter to see if you can get it working properly. If you sincerely believe that you've exhausted every debugging option on your own, you'll have to provide much more detail for someone to properly help guide you to a solution.