albe/node-event-storage

Project Status

xabitrigo opened this issue · 3 comments

I'm considering an event sourcing system for my backend. My intention is to have an in-memory application state and any modification to this state is stored as an event.

I want to know what is the status of this project, specifically:

  • Is your intention to keep maintaining it?
  • Would I be able to recover the current state (or near) from a system crash?
  • Why do you say it's not production-ready? Lack of some feature or some known bugs?

Thank you.

albe commented

Hey, thanks for considering this project. I'll try to answer your questions as of the current time:

  • Is your intention to keep maintaining it?

Yes, I do want to keep maintaining this project for as long as my time allows. It started from a work project where a fast embedded storage for domain events, that does not need to rescan the whole database on every start (like other embedded document stores like NeDB or TingoDB do) was needed. Unfortunately, the focus on this work project has shifted away and therefore also a bit of my focus on this storage, because I'm not actively using it at the moment. Still, this is to-date my most loved pet project.

  • Would I be able to recover the current state (or near) from a system crash?

The storage engine is written such, that it only appends to files which makes catastrophic data loss very unlikely - only in the case of hard disk failure. So if that is something you need to cater for, you need to deploy a backup strategy, but this can also be done easily, because of the append-only nature so you can safely create copies of the files.
Also, commits are written do disk at least once per node event loop or whenever a given amount of documents are in the queue or the write buffer is full. The latter two can be configured with the options maxWriteBufferDocuments and/or writeBufferSize respectively. That way you can configure the maximum amount of data loss in terms of amount of documents/events or data size.
If strict durability is a concern, you might also want to consider using the syncOnFlush option, which will force a disk sync operation on every write, but comes at a very high performance cost. See also https://github.com/albe/node-event-storage#durability
Regarding recovery, there is currently no automatic process in place in the case that a document has not been written fully (has not yet occured to me). This is something I have high up on my priority of improving, because realiability is important to me myself. If corruption happens, you can however truncate the storage after the last valid document (see https://github.com/albe/node-event-storage/blob/master/src/Storage.js#L419). What's missing is an automatic process to detect corruption and then start the truncation.

  • Why do you say it's not production-ready? Lack of some feature or some known bugs?

The only real reason is, that this is not battle-tested enough. We have used it in this work project mentioned above and stored hundreds of megabytes of event data in it repeatedly during the testing phases without problems. We also added a replication protocol based on RAFT (https://raft.github.io/) on top, also without problems. However, since the project was put on hold no further experience could be collected beyond this. What is keeping me from removing this warning, is collecting more information on behaviour, especially regarding failure scenarios and data loss and then covering this with tests. So I'm motivated in working in this part if anything comes up.

Hope this is of help to you. Let me know if there are any further questions.

albe commented
  • Would I be able to recover the current state (or near) from a system crash?

Regarding recovery, there is currently no automatic process in place in the case that a document has not been written fully (has not yet occured to me).

Little status update, this is currently in the works as part of #31 in the PR #107 and the most important prerequisite to solving #24 has been implemented with #80.

The goal is to have crash recovery in version 0.8

albe commented

Further update, as 0.8 has been released (took quite long, but I'm still on this project): The storage can be recovered from torn writes after a crash with #155 - this does not yet include synching up indexes that have fallen behind due to a crash, or commits of multiple events where the write has failed in between. This is the next step.