japila-books/spark-structured-streaming-internals

reading json file multiple times into structured streaming application

yashyi opened this issue · 1 comments

Hi ,

recently I have started working on structured streaming. I am not able to find solution to following scenario. Need your guidance here.

I am trying to find a keyword match(using regex) on steaming data column. All my keywords are mentioned in a json file. I want to update the json file once in 2 hours with new set of keywords while my streaming application is up and running. changes done to my json file is not getting reflected in application.

tried following scenarios, but both did not work.

  1. update json file with new keywords
  2. read updated json file into a dataframe and apply a transformation.

test.txt

Can you please use https://stackoverflow.com/questions/tagged/spark-structured-streaming for this question to reach out to the Spark Structured Streaming community? That would greatly increase a chance to have a solution soon(er).


Just to give you some ideas to get started regarding your issue. Once a file is loaded, it may stay in memory for some time and won't get reloaded. In such a case I'd rather upload a new file with the necessary changes or use Delta Lake. Let's talk it over further on SO.