interval_dict
An associative array / interval library for c++17/20:
- Key / value lookup dictionary (like
std::map
etc.) - Associations are only valid over specific intervals. Useful for building time-varying lookup dictionaries.
Overview:
Interval Dictionaries are an extension of the associative containers that allow efficient lookup of Key→Values that are valid only over a specified interval.
Two stage lookup:
- Stage 1: Search, removal, and insertion operations for key → interval/values have logarithmic complexity (
std::map
under the hood). - Stage 2: Values specified by query interval: Search operations have logarithmic complexity. Different implementations have different complexity for insertions and deletions.
Documentation:
Woefully incomplete documentation here. Better documentation coming soon!
Special features:
1. Chaining lookups
interval dict
dictionaries can be "chained" together:
Dictionaries of A
→B
and B
→C
can be combined to a dictionary of A
→C
.
The associations will take into account the overlapping intervals for A
→B
and B
→C
2. Filling interval gaps
Missing data can produce breaks / gaps in associations.
For example, data for company
→website
may have gaps every weekend if the spider only scrapes on weekdays.
We can fill gaps by either
- extending Friday data ("forward filling" by 2 days), or
- fill in weekend gaps only if Friday and Monday values match.
Gap-filling functions in interval dict
support diverse, practical use-cases.
3. Bidirectional lookup
It is often useful to have associative types that support two-way lookup: both key → value and value → key.
For a one-off, make an inverse dictionary (interval_dict.invert()
).
Other times, it is more convenient or efficient to use a dictionary that supports two-way lookup directly. Under the hood, the bidirectional interval map stores the data in both direction, so there is no savings in space.
License:
Dependencies
-
Interval handling uses the Boost Interval Container Library (ICL).
IntervalDict
thus supports many interval variants: open, close, left-open, right-open, dynamic, discrete (dates) and continuous (floats) underlying types etc. -
Lazy / synchronous iteration of intervals uses co-routines, provided by
generator<T>
from cppcoro.
This requires a compiler that supports the coroutines TS (C++17) or C++20.
Supported Compilers
The code is known to work on the following compilers:
This reflects the requirements of the cppcoro library
Current Status:
interval dict
is under active development.
-
Support open/close/mixed intervals comprising
- int / double
- boost:gregorian::date
- boost::posix_time::ptime
Empty intervals are ignored
-
Two choices of implementation:
-
IntervalDictICL
ofinterval_dict
uses a Boost ICLinterval_map
ofstd::set
This was straightforward to implement, and stores the underlying data as disjoint intervals for quick lookups.Search operations have complexity. However, for insertion/removal of elements, this data structure can result in polynomial or worse complexity .
This typically is noticeable if there are large number of intervals (>50,000) for a single key. This happens if the associations are very imbalanced (each key has many tens of thousands of values with overlapping validity intervals).
-
The second implementation
IntervalDictITree
ofinterval_dict
uses an Interval Tree. This stores the underlying data as continguous non-overlapping intervals.Search operations have complexity. However, insertion and deletions can be much more efficient: .
Implemented using tinloaf/ygg and boost::intrusive, both red-black trees underneath.
-
-
Tests 100% coverage
-
Bidirectional dictionary. This can be parameterised for different implementation in each direction.
In progress:
- Benchmarks
- For successively overlapping intervals
- For nested intervals (like a pyramid)
- Simulate a random proportion of values swapping between different keys at successive intervals
- Fuzzing tests for implementation 1 vs 2
- Abandon ygg and move to boost::intrusive
- Write docs
- Doxygen for new implementation
- Doxygen html for website
- Build and integration tests for website