harsha2010/magellan

Issues Using Indexed Columns While Doing a Spatial Join

mdbuck opened this issue · 3 comments

Attached is a driver application demonstrating how something bad happens when trying to do a spatial join between a point DataFrame and a polygon DataFrame using the new indexing feature in Magellan 1.0.5:

  • a NullPointerException may be thrown;
  • an OutOfMemory error may be thrown;
  • the JVM may crash;
  • the application may finish fine but contains.show() displays gibberish to the console.

magellan-wkt-within.zip

how many nodes are you using? a single driver and no workers? how big is your driver node? and how much data are we talking about? (polygons and points)

The driver application is a command line application that starts up Spark with spark.master == local[1]

The data is small: the polygon table contains 5 rows with the largest polygon containing 9 nodes; the point table contains 8 rows.

I have simplified the driver application. Please see attached.

PolygonDriver2.zip

Any more news on this?

Thanks.