Issues Using Indexed Columns While Doing a Spatial Join

Question

Issues Using Indexed Columns While Doing a Spatial Join

mdbuck opened this issue 7 years ago · 3 comments

Attached is a driver application demonstrating how something bad happens when trying to do a spatial join between a point DataFrame and a polygon DataFrame using the new indexing feature in Magellan 1.0.5:

a NullPointerException may be thrown;
an OutOfMemory error may be thrown;
the JVM may crash;
the application may finish fine but contains.show() displays gibberish to the console.

magellan-wkt-within.zip

Answer 1 · 2017-10-02T20:26:58.000Z

how many nodes are you using? a single driver and no workers? how big is your driver node? and how much data are we talking about? (polygons and points)

Answer 2 · 2017-10-03T12:59:04.000Z

The driver application is a command line application that starts up Spark with spark.master == local[1]

The data is small: the polygon table contains 5 rows with the largest polygon containing 9 nodes; the point table contains 8 rows.

I have simplified the driver application. Please see attached.

PolygonDriver2.zip

Answer 3 · 2017-10-16T11:31:26.000Z

Any more news on this?

Thanks.