cylondata/cylon

Corrupted result when joining tables contain list data types

JayjeetAtGithub opened this issue · 6 comments

I am trying to use Cylon to join 2 Arrow tables in a local context. Each Arrow table has some columns/fields of list data type. The join results in an ok status, but upon inspecting the result joined table, turns out the list type fields are getting corrupted while the fixed width fields are fine. You can find my code here. I am joining the files table_1.parquet and table_2.parquet. Here is the schema of table_1 and table_2.

@JayjeetAtGithub thanks for reaching out. Can you also attach the schema files here?

I've opened PR with a possible fix #616.
But there are some limitations AFAIU.

  1. Distributed join is not supported for list types (cylon communication ops, ex: shuffle, gather, doesnt support list types ATM)
  2. Using list types as keys (even for local join), is not supported (we need to add list type hashers, and comparators, to make this work)

I've opened PR with a possible fix #616.

Hi @nirandaperera , thanks a lot for the fix. The fix works great. For now I am interested in local joins only and my keys are fixed-width fields, so the fix should be good.

@JayjeetAtGithub great to hear that. Let me add a test case to this and then merge the fix

Fixed with #616