GenSync is a framework for efficiently synchronizing similar data across multiple devices.
The framework provides a shared library for benchmarking and optimizing a variety of state-of-the-art data synchronization protocols, either offline or directly embedded within application code. In one typical use-case, an application would use GenSync for its core data synchronization needs, and developers can compare and optimize the performance of different synchronization protocols to suit their needs. Alternatively, users could utilize the GenSync library to profile synchronization usage for their application, and then experiment with synchronization protocols offline to improve perfromance.
The current implementation of the framework includes four families of data synchronization protocols (and their variants)):
- FullSync - a trivial protocol that transfers all data from one device to another for a local comparison
- CPISync - based on Characteristic Polynomial Interpolation
- IBLTSync - based on Invertible Bloom Lookup Tables
- CuckooSync - based on Cuckoo tables
Suppose that laptop A
has a list of contacts:
Alice, Jim, Jane, Rick, Bob
and cellphone B
has the contacts:
Alice, Jim, Suzie, Jane, Rick
Then an efficient synchronization protocol might quickly idenfity the differences (Rick
on the laptop, and Suzie
on the cellphone)
and exchange only these contacts, rather than sending the entire contact list from one device to another.
The source code for this library is divided among several repositories.
- gensync-core - the core framework code that is used by other repositories.
- gensync-lib - used to produce the GenSync library and associated headers.
- gensync-macports - used to produce the macports version of the GenSync library for MacOS.
- gensync-benchmarking - used for including benchmarking capabilities.
Each of these protocols is implemented as a peer-to-peer protocol. For purposes of explanation, one peer is called a client and the other a server .
- FullSync family
- FullSync
- The client sends all its data to the server, which determines what differs between the two devices and notifies the client of the differences.
- FullSync
- CPISync family
- CPISync
- Implements the Characteristic Polynomial Interpolation protocol from MTZ03 with a known bound on the number of differences between client and server. The client sends evaluations of the characteristic polynomial of hashes of its data to the server. The server interpolates (and factors) a rational function that identifies the differences between client and server, and then reports the hashes of these differences back to the client. A final communication round exchanges the actual data corresponding to the identified hashes.
- ProbCPISync
- Perform CPISync without a known bound on the number of differences between client and server. Client and server start with a guess on the difference bound. If the guess is wrong, it is doubled and the process is repeated, until a correct bound is identified, whereupon CPISync is run.
- InteractiveCPISync
- Performs CPISync interactively according to the protocol of MT02. No bound on differences is needed, and both computation and communication are efficient.
- CPISync_OneLessRound
- A variant of CPISync that does not using hashes ( i.e., it operates directly on the data ). This avoides the final communication round where hash inverses are exchanged between client and server.
- OneWayCPISync
- A variant of CPISync that communications only in one direction, providing the server with information about the clients data (but not vice versa). This requires no feedback from server to client, and the transmitted data can be stored as a string for subsequent synchronization.
- CPISync
- IBLTSync family
- IBLTSync
- Client and server encode and exchange their data within an Invertible Bloom Lookup Table. Subtracting these tables and deecoding the result provides the differences between client and server
- OneWayIBLTSync A variant of IBLTSync that communications only in one direction, providing the server with information about the clients data (but not vice versa). This requires no feedback from server to client, and the transmitted data can be stored as a string for subsequent synchronization.
- Set of Sets Sync*
- This implements an IBLT-based synchronization of sets of sets described in MM18.
- IBLTSync
- CuckooSync family
- CuckooSync
- Client and server encode and exchange their data within a cuckoo filter. A comparison of the filters yields the differences between client and server
- CuckooSync
If you use this software, please cite at least the following paper (pdf, DOI):
@article{bovskov2022gensync,
title={Gensync: A new framework for benchmarking and optimizing reconciliation of data},
author={Bo{\v{s}}kov, Novak and Trachtenberg, Ari and Starobinski, David},
journal={IEEE Transactions on Network and Service Management},
volume={19},
number={4},
pages={4408--4423},
year={2022},
publisher={IEEE}
}
The following works are also significant to this software implementation:
-
[MTZ03] Y. Minsky, A. Trachtenberg, and R. Zippel, "Set Reconciliation with Nearly Optimal Communication Complexity", IEEE Transactions on Information Theory, 49:9. http://ipsit.bu.edu/documents/ieee-it3-web.pdf
-
[MT02] Y. Minsky and A. Trachtenberg, "Scalable set reconciliation" 40th Annual Allerton Conference on Communication, Control, and Computing, 2002. http://ipsit.bu.edu/documents/BUTR2002-01.pdf
-
[DTA03] D. Starobinski, A. Trachtenberg and S. Agarwal, "Efficient PDA synchronization" IEEE Transactions on Mobile Computing 2:1, pp. 40-51 (2003). http://ipsit.bu.edu/documents/efficient_pda_web.pdf
-
[SCT06] S. Agarwal, V. Chauhan and A. Trachtenberg, "Bandwidth efficient string reconciliation using puzzles" IEEE Transactions on Parallel and Distributed Systems 17:11,pp. 1217-1225 (2006). http://ipsit.bu.edu/documents/puzzles_journal.pdf
-
[KLT03] M.G. Karpovsky, L.B. Levitin. and A. Trachtenberg, "Data verification and reconciliation with generalized error-control codes" IEEE Transactions on Information Theory 49:7, pp. 1788-1793 (2003).
-
[EGUV11] D. Eppstein, M.T. Goodrich, F. Uyeda, and G. Varghese. "What's the difference?: efficient set reconciliation without prior context." ACM SIGCOMM Computer Communication Review 41.4 (2011): 218-229.
-
[GM11] M.T. Goodrich and M. Mitzenmacher. "Invertible bloom lookup tables." 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2011.
-
[MM18] M. Mitzenmacher and T. Morgan. "Reconciling graphs and sets of sets." Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 2018.
-
More at https://people.bu.edu/trachten.
Elements of the GenSync project code have been worked on, at various points, by:
- Ari Trachtenberg
- Sachin Agarwal
- Paul Varghese
- Jiaxi Jin
- Jie Meng
- Alexander Smirnov
- Eliezer Pearl
- Sean Brandenburg
- Zifan Wang
- Novak Boškov
- Xingyu Chen
- Nathan Strahs
- NSF
- Red Hat Collaboratory
- Professors:
- Ari Trachtenberg, trachten@bu.edu, Boston University
- David Starobinski, staro@bu.edu, Boston University