ExpediaGroup/circus-train

Circus Train should also sync the external schema for Avro Tables when replication mode is METADATA_UPDATE

abhimanyugupta07 opened this issue · 1 comments

As a user of CT,

I want that the Circus Train should also sync the external schema for an Avro Table when the replication mode is either METADATA_UPDATE or METADATA_MIRROR.

Context
At the moment, when the replication mode is METADATA_UPDATE or METADATA_MIRROR, the table location is not needed to be provided in the config file which means that the null check on https://github.com/HotelsDotCom/circus-train/blob/master/circus-train-avro/src/main/java/com/hotels/bdp/circustrain/avro/transformation/AbstractAvroSerDeTransformation.java#L50 returns empty and the Avro Transform is not triggered which in turn will not copy the schema file to replica.

This was discovered while working on: #131

Related PR: #141

Possible Solution
CT can detect if the table location is not provided and in the Avro transform, it can get the table location from the target HMS.

This is not needed for METADATA_MIRROR as that is an exact metadata copy of the table so all locations will be the same