DiffStream

Project to examine performance and reliability of keeping multiple distributed caches in sync by sending diffs instead of the full object on each update.

Example (for simplicity, without the IPC).

>>> from cache import cache
>>>

Create the producer and consumer caches and some data in a dict.

>>> producer = cache.DiffCache.producer()
>>> consumer = cache.DiffCache.consumer()
>>>
>>> data = {'key': 314159, 'a': 'Some really, really important text', 'b': 42, 'c': {'c1': True, 'c2': False}, 'd': list(range(10))}
>>> pprint.pprint(data)
{'a': 'Some really, really important text',
 'b': 42,
 'c': {'c1': True, 'c2': False},
 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'key': 314159}
>>>

Updating the producer cache will produce a DataMsg which can be applied to the consumer cache to bring it up to date.

>>> msg = producer.update(data)
>>> _, _, data_ = consumer.update(msg)
>>> pprint.pprint(data_)
{'a': 'Some really, really important text',
 'b': 42,
 'c': {'c1': True, 'c2': False},
 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'key': 314159}
>>>

When changes are made to the data, subsequent DataMsg update will only contain the diff needed to sync the cached dicts.

>>> data['c']['c3'] = 'Maybe'
>>> pprint.pprint(data)
{'a': 'Some really, really important text',
 'b': 42,
 'c': {'c1': True, 'c2': False, 'c3': 'Maybe'},
 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'key': 314159}
>>>
>>> msg = producer.update(data)
>>> _, _, data_ = consumer.update(msg)
>>> pprint.pprint(data_)
{'a': 'Some really, really important text',
 'b': 42,
 'c': {'c1': True, 'c2': False, 'c3': 'Maybe'},
 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'key': 314159}
>>>

Why go though the trouble with the diff? To avoid sending all of the dict data when only a subset is changed.

>>> import pprint, json
>>> buf = json.dumps(data).encode()
>>> type(buf), len(buf)
(<class 'bytes'>, 151)
>>> buf_ = msg.encode()
>>> type(buf_), len(buf_)
(<class 'bytes'>, 115)

christianreimer/DiffStream

DiffStream