tywalch/electrodb

Immutable collections

Closed this issue · 0 comments

Before I explain the concept of "Immutable collections" let me just say thank you for writing this library. I investigated a couple of others and this one is the best I found. I only tried some bits and went through the docs but so far it is great.

I use the "Immutable collections" concept in a few of my projects and they work well so I was wondering if I can get your feedback and see if that would be possible to implement in electrodb. If not, then I will handle that by creating another service to do that.

The way you have implemented collections is impressive but I see one drawback which is, all collections use GSIs. GSIs are great but nothing is free. These are my concerns around GSIs

  • to read data from collections all GSIs require to project all data from the table, which increases the cost of storing data, and additional cost for all transfers, every time we update an attribute on a table all GSIs will be updated and I believe DynamoDB will be coping the whole item to all GSIs,
  • on top of that backups, global tables and point-in-time recovery add quite a bit of complexity and with a high number of GSIs this might be a problem
  • in a scenario where you have 20 or more GSIs, in a very large table our costs might be high and some processes can be slower (backups, restoring data)

So my solution is to maintain my own index in the same table under different PK, SK for all attributes which are immutable and can be queried. Let's say we store organizations and each organization has a type (SMALL, MEDIUM, LARGE), for this example let's assume that attribute never changes. So after we create an organization we will never change that attribute, in that case, I don't want to use GSI for this but I still want to be able to query organizations based on type. To do that while creating an organization record I also create another item which will be that index (no other attributes, just PK, SK). Since that attribute is immutable I will never have to update that index (if that was GSI all updates on organization record would trigger data replication to all GSIs).

To query all organizations by type I would use that index. The only downside here is that I need to use two queries (assuming I don't fetch more than 100 records and they all are under 16MB). The first query would get all records for a specific organization type and the second query will query all organizations' details.

// Organization entity
{
    "pk": "org#1",
    "sk": "...",
    "type": "SMALL"
}
// OrganizationByType entity
{
    "pk": "orgByType#SMALL",
    "sk": "org#1"
}

So the main benefits are:

  • decrease the number of required GSIs which should reflect costs. This solution is not recommended for attributes that can be changed.
  • assuming the table is running on-demand and all our indexes have a unique PK, stored data is minimum we don't have to worry about provisioning (in case of GSIs we use GSI overloading so we don't know what kind of load will be used and what provisioning is required, setting up all GSIs on-demand will be expensive)
  • this solution would work even for attributes that can be changed, but it would require two updates

Drawbacks:

  • when we create an organization we also need to create a type record, so to keep consistency at 100% we need to use a transaction to insert both at the same time, which let us support up to 25 immutable attributes, more than that would require some kind of compensation process to be implemented
  • when query we need to use at least two queries

Let me know what are your thoughts.