talariadb/talaria

Has compatibility problems when introduced the config sink

atlas-comstock opened this issue · 8 comments

https://github.com/talariadb/talaria/pull/86/files

This MR introduced sink configs, which means to abstract writers, however it does not compatable with old config files.

I prefer that we do not introduce any compatibility problems when possible. cc @kelindar @tardunge

@atlas-comstock I could not get the issue you are facing. Just incase if i understood correctly..
The MR[pull-86] has breaking changes. One needs to update the config file and configure sinks as a list instead
of a map.
For Sinks being a map, we cannot use sinks from same provider, as the last read key-value pair will override the previous one. This is important for fanout streams/sinks with different filters and different destinations belonging to a single data-storage provider.
ex:

sinks:
   \- bigquery:
         # sink conf for table-1
         - filter: filter based on some constraint
         
   \- bigquery:
          # sink conf for table-2
         - filter: filter based on a different constraint

Any thoughts on achieving this behaviour without compromising on backward compatibility for configs?

The MR[pull-86] has breaking changes. One needs to update the config file and configure sinks as a list instead of a map.

Yeah, the understanding is correct. Let me reply more thoughts on this tmr.

@atlas-comstock can you please test this pre-release 1.6.0 version with changes in your config?
Thanks.

@tardunge Sure, let me try it today.

@tardunge hi, sry for the late. Updating the config file works;

      sinks:
        - azure:
            xxxxxxx

I don't have a good idea to solve the compatibility issue, let's see if @kelindar has a good one.
If not, I am ok with this. THanks!

@atlas-comstock Thanks for testing mate.
Also, we have the data in talaria erased from the disk right after compaction is done.
If we isolate this flushing process with intervals specific to each sink and with checkpointing, we can have the data persisted till the end of ttl duration per table.
This gives more control on how the data can be flushed per sink, as some sinks like cloud storages can benefit with larger compaction intervals.
This will also bring some changes in config file.
@kelindar Any thoughts on this?

fixed.