51zero/eel-sdk

Enhanced error handling for HiveWriter

Opened this issue · 3 comments

An enhancement request if possible...

When an exception is thrown from the underlying format writer (Parquet, Orc) ... it would be nice if we could trap the exception higher up the stack in the HiveSink and report the offending column and rethrow with the column in the message.

This would greatly help when trying to find data issues coming from the source (JdbcSource).

What happens at the moment ?

Just to be clear it's NOT an EEL bug.

Whatever the Parquet exception is...it is propagated up the call stack.

In this case I think it gave a number format exception showing the value which is a string - if I knew which column it was I could the query Oracle to find the offending value and check the type.

There are other cases where it's failing to convert a Timestamp and showing the value but in this case there around 6 Timestamp columns in the source.

Something for the future maybe - It's a potential time saver.😄

Oh I see, improve the error message.