cloudera-labs/hms-mirror

EXPORT_IMPORT data duplication with subsequent runs

dstreev opened this issue · 1 comments

The default behavior of the IMPORT process doesn't DROP existing data. So additional runs will append to current datasets.

If you're using this process to OVERWRITE an existing table, you may not get the results you'd expect.

Further research into this shows that this is a normal function of the EXPORT_IMPORT hive process. Precautions should be made to 'reload' the data. hms-mirror is primarily a migration / one-time use tool and doesn't review existing data for these conditions.