Help w Spark?

Question

Help w Spark?

Closed this issue 7 years ago · 6 comments

Hi there - I have the test working w the docker container and the java server/python script. My next test is to get it working w spark. Do you have a working example of BroadcastMatcher.scala (I'm trying to get things to work w your sample scala code block but I think I need to import a few classes and I want to make sure I'm referencing the right configfile etc). It looks like the broadcast is pointing to the postGIS db to pull the osm data to each node in my spark cluster?

Thanks for any help/advice. Great project!

Answer 1 · 2017-06-23T08:58:29.000Z

For Spark, we upload the bfmap file to HDFS. This allows the Spark executors read the map from HDFS, which should be better than having them all connecting to the PostgreSQL database at the same time.

To create a bfmap file from the PostgreSQL database, use this code: https://gist.github.com/jongiddy/67c7ace4e7394e1e5f3bea978ddf74ec (this is set to run inside a Vagrant virtual machine, but changing the hardwired /vagrant paths will make it suitable for other environments).

To read the bfmap file from HDFS, we created a HadoopMapReader: https://gist.github.com/jongiddy/b68be517274a424df84d2bea4cdd6354

And our BroadcastMatcher then looks like this (although I have edited out some application-specific code): https://gist.github.com/jongiddy/286857e09f9881854a725634ca82b515

Answer 2 · 2017-06-23T12:45:46.000Z

@jongiddy Thanks, for sharing that! BTW: ... double-checked locking, you could have let me know. ;)

Answer 3 · 2017-06-23T13:34:26.000Z

@smattheis No worries! To be clear, we never saw a problem caused by the locking. I think I added that while debugging a thread-safety issue that actually occurred in a part of my code.

Answer 4 · 2017-06-23T16:47:57.000Z

@jongiddy @smattheis thanks guys this is awesome

Answer 5 · 2017-06-24T05:34:50.000Z

@jim2 I'm glad it helps. If you any more question, feel free to keep this issue open, or open another one. Once the question/problem regarding spark is resolved for you, please close the issue.

@jongiddy Alright. Anyways, your reference in the comment explains that it's a code smell. I never knew that.

Answer 6 · 2017-07-12T12:45:59.000Z

I'm closing this as it seemed to be resolved.