Incremental Data Updates

Question

Incremental Data Updates

itdependsnetworks opened this issue 2 years ago · 8 comments

Environment

DiffSync version: 1.x

Proposed Functionality

Provide the ability to sync updates as they happen. This may be a specific implementation of #142, but I think it makes sense to consider.

Use Case

There are times in which near real time sync is required and greatly wanted. If you consider a workflow that adds a device to the SoR, updating that on all systems, such as monitoring systems.

Answer 1 · 2022-12-20T16:37:11.000Z

I don't quite follow as to what

Provide the ability to sync updates as they happen.

means. Do you want to subscribe to changes webhook-like?

Answer 2 · 2022-12-20T16:39:07.000Z

means. Do you want to subscribe to changes webhook-like?

Correct.

Answer 3 · 2022-12-20T16:40:57.000Z

So we would be looking at the implementation of something that either listens to webhooks/similar if that functionality is available on the source system or periodically queries out to the source system and calculates the diff, syncing if there is any?

Answer 4 · 2022-12-20T16:44:27.000Z

I don't know tbh, my mind was in the kafka bus mindset. That being said, it is likely more about the signature more-so than the actual integration.

Answer 5 · 2022-12-20T16:49:05.000Z

So what kind of API should diffsync specifically offer to facilitate this? The functionality for creating a diff and not syncing it is already there, so you could feasibly write an integration that triggers diffsync based on an event on a bus, couldn't you?

Answer 6 · 2022-12-20T17:06:42.000Z

Just spit balling here in 30 seconds.

Create a method called something like "incremental_namespace"
This would provide the namespace that would allow to do basically
- Only include creates or updates
- Only include models in the namespace

Answer 7 · 2022-12-21T14:46:52.000Z

Outcome of a verbal discussion:

Think about the possibility of having (next to having just a load function) methods for load_$model_name to load specific models (possibly by identifiers) and their dependencies, and possibly have a load_all_$model_name(filters) to load all the model names according to a specific set of filters. It is currently unclear to me whether the currently child/parent relationships are modeled in detail enough to facilitate this use case.

How does this help us?

This would enable an outside integration listening to an event bus to only act on those event specific consequences, which could be faster by orders of magnitude to execute than the entire synchronization.

Answer 8 · 2023-10-27T14:51:39.000Z

Need to think about the listener?
Need to think about the publisher?
Need to think about being protocol non-specific.
Need to think that the transactional data, will not likely have any children data.