inrupt/pod-server

Add Mongodb backend

michielbdejong opened this issue · 5 comments

Redis is a cache that can also persist. Mongo is a persister that can also cache. :)
Mongo is more popular than Redis, although that is probably almost entirely for use as a model store, Rails and similar MVC frameworks.

But I do like the idea of having a persistence-oriented persistence layer, because data loss should be the biggest concern in a pod-server, much more important than latency/throughput.

So it makes sense to implement a Mongodb backend as well!

Just looking at the docs, we could maybe use https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/ for the Container-Member relationship...

Ah no, "to model large hierarchical data sets", "use normalized data models", see https://docs.mongodb.com/manual/core/data-model-design/#data-modeling-embedding. So I'll probably have to use https://docs.mongodb.com/manual/core/transactions/#transactions-api

OK, so answer: https://stackoverflow.com/questions/16523621/atomicity-and-cas-operations-in-mongodb and https://docs.mongodb.com/manual/core/write-operations-atomicity/#update-if-current

I'll use mongodb documents that just say { "value": "..." } and then set the current value as an update condition, but this is going to be very bad in terms of performance, when compared to Redis. Because in Redis I only have to do WATCH, and in Mongo I actually have to retrieve the current version. For large blobs that's going to be a nightmare.

Note that this is not necessary for containers, they're basically read-only except for the 'DELETE' operation, and for that we can set the update condition to { "members": [] }, thanks to solid/solid-spec#172.

Ah, for updating large blobs we can add a 'version' field, which is retrieved on getBlob and then used as a condition (and updated!) on setData.

We've used Redis as a primary data store with no issues, it isn't just a cache database, its a persistent database with caching abilities as will (by way of expiration & indicating LRU config).

We did for a while use MongoDB for a lot of things, but when you're only searching for data using key (or URI values) then there is no need for MongoDB.

That said, MongoDB is a good option, and would even allow you to use its _id field as the primary URI.

I think it would be nice to be able to abstract away whether it is MongoDB or Redis by way of a "key store", which can be backed by MongoDB or Redis. Users could specify either REDIS_URL, MONGO_URL, if neither are provided, fall back to whatever is default (e.g. fs store).

So really what I'm saying is define the pods requirements for a data store, rather than letting the choice of data store define the requirements.

We're pretty happy with redis atm, may reopen later if pod providers explicitly ask us for this feature.