Note: This project is currently being worked in a closed source repo as a part of https://zerosync.dev/. To be re-opened as part of Zero in a few months.
This project is not ready for public consumption. All APIs are currently being overhauled to better support integration with existing databases. Join and Reduce are naively implemented at the moment.
Incremental view maintenance for JavaScript.
Problem --
You have a pipeline:
data.map(...).filter(...).reduce(...)
which you don't want to re-run from scratch each time something in data
changes. Instead you'd like to only run against the data that changed and have your result incrementally updated. That's what this is for.
- Simple Task App & Linear clone
- A walkthrough of building an app on top of materialite & differential dataflow
pnpm install
cd buildall
pnpm watch
Prerequisite: pnpm install
pnpm test
A walkthrough of building a React Tasks app with Materialite can be found here: https://github.com/vlcn-io/materialite/blob/main/demos/react/walkthrough/walkthrough.md
The main concepts in Materialite
are:
- Sources
- Streams
- Materialized Views
Everything starts with a Materialite
instance. This instance is used to create data sources, connect third party sources and commit transactions. Sources emit streams against which computation (map/filter/reduce/join/count/union) can be written. Those streams can then be materialized into an incrementally maintained collection that represents the results of the computation.
const materialite = new Materialite();
const source = materialite.newSortedSet(comparator);
const view = source.stream.map(...).filter(...).materialize();
materialite.tx(() => {
source.add(...);
source.add(...);
source.add(...);
});
After having a Materialite
instance, sources is the first concept to understand.
All computation starts with a source that is obtained from a materialite
instance. A source is a collection to which you:
- Add and remove values via
source.add
andsource.remove
- Attach pipelines
Along with the ability to define custom sources, there are three built in sources in Materialite:
- Stateless Set
- Sorted Set
- Immutable Sorted Set
Once you have a source you can build compute pipelines against it by attaching map
, reduce
, filter
, join
, count
, etc. to that source's stream
.
A stateless set is a set which does not hold onto its value. Values added or removed from a stateless set are passed along to the compute pipelines attached to the set.
The advantage of a stateless set is that you only pay the memory cost of your final view and any operators which allocate. The disadvantage is that all pipelines must be attached to the set before any values are added or removed. If pipelines are attached later, they have no way of getting previously added values.
const materialite = new Materialite();
const source = materialite.newStatelessSet<number>();
// pipelines should be added to the source before values
// needed by the pipeline are added to the source.
source.stream.filter(...).map(...)...;
source.add(...);
source.delete(...);
A set which is sorted and holds onto its current state. Values added or removed from the set are passed along to the compute pipelines attached to the set.
The advantage of this stateful set, over a stateless set, is that pipelines can be attached to the source even after values have already been added. These pipelines will be run with all historical values contained in the set in order to catch them up to current time.
Another nice property of the sorted set is that it is sorted. Materialized views that use the same ordering as the underlying set can be constructed more quickly. This is useful in situations where queries can change dynamically. E.g., users modifying a set of filters.
const Task = {
id: string;
dueDate: number;
};
const materialite = new Materialite();
const source = materialite.newSortedSet<Task>((l, r) => {
const comp = l.dueDate - r.dueDate;
// If items have the same date we need to compare on primary key to
// ensure the items are distinct
if (comp === 0) {
return l.id.localeCompare(r.id);
}
return comp;
});
source.add(...);
source.delete(...);
// we can inspect all the values in the source
// since it retains them
const set = source.data;
for (const v of set) {
console.log(v);
}
// We can attach pipelines even after values were added to the source.
// The pipelines will receive those past and all future values.
source.stream.filter(...).map(...)...;
The same as sortedSet
except that the underlying set is a persistent or immutable data structure. This means that every modification of the set creates a new version of that set. Materialite
uses structural sharing between versions so this copying is relatively cheap and efficient.
ImmutableSortedSet
does not hold on to old versions automatically but allows the developer to retain them if needed.
E.g.,
const materialite = new Materialite();
const source = materialite.newImmutableSortedSet<number>(comparator);
const v1_data = source.data;
const v2_data = source.add(1);
const v3_data = source.add(2);
const v4_data = source.delete(2);
If the three sources above are not sufficient or you'd like to integrate directly to something like SQLite, IndexedDB or some other key-value store connectSource
is available to do this.
Each source has an associated stream
object. It is against the stream
object that compute pipelines are built using map
, filter
, reduce
, join
, etc. Each operation against a stream returns a new stream.
const timesThree = source.stream.map((n) => n * 3);
This allows forking streams and creating a graph of computation. In other words, you can share sections of streams with other streams.
const timesThree = source.stream.map((n) => n * 3);
const timesThreeEven = timesThree.filter((n) => n % 2 === 0);
const timesThreeOdd = timesThree.filter((n) => n % 2 === 1);
const timesThreeEvenCubed = timesThreeEven.map((n) => Math.pow(n, 3));
A stream can either be used directly, to observe and/or operate on each delta or it can be materialized into a "view."
In practice this means that a JavaScript collection is created which represents the result of the computation of the stream. As data is added or removed from the source, this collection is incrementally updated.
There are a few kinds of materializations:
- ImmutableSortedSet
- MutableSortedSet
- Array (plain old JS Array)
- CopyOnWriteArray (plain old JS Array but we produce a new one on each update)
- Map (plain old JS Map)
- CopyOnWriteMap (plain old JS Map but we produce a new copy on each update)
- Custom
installed & built.
cd demos/react
pnpm dev
- Comparator for difference stream so equality can be key based?
- Better performing tree other than red-black tree? For
Index
and consolidation functions - Re-write Join and Reduce to be more efficient
- Handle
take
and pagination - Explore materialization structures
- Sorted sources for joins?