khonsulabs/bonsaidb

Project Status

Closed this issue · 8 comments

Hey, just stumbled across BonsaiDB and it looks really neat! The last commit was in August so I just wanted to check before I get too far into it, is this project is still being maintained?

ecton commented

Hi, thank you for asking! It is still in active development, but it's definitely had its progress slowed. There is a pending file format redesign that I plan on offering at least migration tools to help migrate anyone who is currently using BonsaiDb to the new file format.

Since I haven't really documented this on GitHub anywhere, here's a rough timeline of what happened:

  • May 2022: Discovered File::flush doesn't call fsync, and learned about tmpfs. The summary is that after calling the correct method for fsync to happen, design decisions I had made early on caused BonsaiDb/Nebari's transactional writes to be quite slow. For light applications, the speed would still have been perfectly acceptable, but under any significant write load, the database would become a bottleneck much quicker than PostgreSQL or SQLite would.

  • May 2022: Updated Nebari with new transaction batching. This change significantly improved performance, but there were still two fsync operations per transaction. The only way I could see improving things would be changing my approach to how data was stored.

  • June 2022: While trying to measure and understand various file synchronization mechanisms performance, I discovered that SQLite on MacOS isn't actually ACID compliant.

  • July 2022: I wrote an overview of my goals of Sediment, a storage layer I am planning on sitting below Nebari, which BonsaiDb uses for the underlying database implementation.

  • August 2022: I get Sediment to the point of benchmarking, and I feel pretty good about its overall performance relative to other embedded stores. However, while preparing a new blog post, I went and did the same benchmark against PostgreSQL and discovered that PostgreSQL outperformed them all. Why? It turns out Write-ahead logging is the fastest way to get incoming writes to disk.

  • September 2022: I wrote my own WAL implementation, inspired in-part by sharded-log. Because I again used a new benchmarking implementation, I lost track of the performance of PostgresSQL. I knew I was outperforming sharded-log in my particular benchmark suite, and it was that progress that made me start a new blog post to let anyone following the BonsaiDb blog know what was going on.

    While writing that post, I realized I needed to compare it against PostgreSQL. What I found shocked me -- PostgreSQLs much simpler single-writer-at-a-time WAL design outperformed my implementation and sharded-log significantly, even with a large number of threads all competing to write at the same time. I scrapped the blog post and began rewriting my implementation to be inspired by PostgreSQL instead.

  • October 2022: I finished my rewrite of OkayWAL, and I saw the mountain of work ahead of me to get everything tied back together. I was a bit burned out, and I needed a break.

  • December 2022: I've begun a rewrite of Sediment due to changing some of my goals now that there is a WAL in front of the storage layer. It's still early in development.

Here's the list of what needs to be done for this new storage layer to be integrated:

  • The new version of Sediment that utilizes the WAL needs to be completed, and it needs to meet some basic performance goals.
  • Nebari will need to be rewritten to utilize Sediment for multi-tree storage instead of one-tree-per-file
  • BonsaiDb will need to be updated to work with the new changes to Nebari
  • A tool for migrating Nebari databases to the new version needs to be written
  • A tool for migrating BonsaiDb databases to the new version needs to be written
  • A significant amount of testing and fuzzing needs to be done before trusting the new stack

I hope this assuages any fears about BonsaiDb being worked on. But, I also completely understand if this lack of certainty regarding its performance deters people from trying it out.

At the end of the day, each time I think of building a project with Rust, I still would reach for BonsaiDb even in its current state. It's that basic love of what I've built that will keep me going on this project for a long time.

First off thank you for your comprehensive reply, that was all the information I'd hoped for and more 😄

I hope this assuages any fears about BonsaiDB being worked on.

It does - there's still a small concern of bus count in the back of my head but it's exciting to hear all of this continued thought has been put into the project. Honestly, it sounds like a massive undertaking for a single developer - I'm glad you were able to notice the burn out and take a break when you needed it.

Having re-developed my application in a few different databases in search of a suitable embedded DB, I'm quite excited by the high level struct-centric interfaces, ACID transactions, and versioned map/reduce architecture of BonsaiDB. I still have to validate that BonsaiDB can run on the extremely memory-constrained embedded platform I'm aiming to support, but regardless I am excited to hear that this project is continuing to receive attention - it feels on the right track in many important respects.

With respect to Sediment, I wanted to ask if you had any thoughts on ReDB which I think came out after your work on Sediment began. I haven't seen it mentioned anywhere yet in your stuff so I at least wanted to make sure it was at least on your radar.

ecton commented

It does - there's still a small concern of bus count in the back of my head but it's exciting to hear all of this continued thought has been put into the project. Honestly, it sounds like a massive undertaking for a single developer - I'm glad you were able to notice the burn out and take a break when you needed it.

I definitely would like to improve on the bus count! There are a few people who have been dabbling at contributing to BonsaiDb, and I'm hoping that after development around my new format stabilizes, I might be able to attract more people to those projects as well. It's pretty understandable for someone to not want to learn a mountain of code when its future is uncertain.

I still have to validate that BonsaiDB can run on the extremely memory-constrained embedded platform I'm aiming to support

One contributor was able to get BonsaiDb running on a RaspberryPi. I don't have much embedded hardware yet, but it's something I've been wanting to tinker with more in the coming years. The main limitation for running BonsaiDb in embedded environments is its reliance on std. If you run into any specific issues, please don't hesitate to open any issues!

With respect to Sediment, I wanted to ask if you had any thoughts on ReDB which I think came out after your work on Sediment began. I haven't seen it mentioned anywhere yet in your stuff so I at least wanted to make sure it was at least on your radar.

It did come out in the midst of my experiments, and it looks like a great project. One serious thought I still have is whether Nebari should exist, or whether BonsaiDb should just use another database format. There are two main arguments for pursuing my new format are:

  • BonsaiDb/CouchDB were designed with the idea of being able to embed extra information inside of the B+Tree structures. This is how the map/reduce is powered -- the reduced values can be stored directly in the B+Tree so that a reduce query doesn't need to visit all of the nodes in the tree to come up with an aggregate result. From what I could find, no other database engine that is written in Rust supports embedding extra information inside of the B+Tree structure itself, while it's a key-feature of Nebari.
  • Nearly every other embedded database engine does not utilize a write-ahead log. In my testing, a write-ahead log is absolutely critical for insert performance. A developer can always use a write-ahead log manually in front of the database, but having it built-in seems like a very valuable feature.

I feel like Sediment has a unique offering that's worth exploring, and I'm hopeful that I finally have the right combination of strategies to get the performance I'm looking for. If I don't, I'll definitely be considering using another format again.

I hope your experiments are successful at getting BonsaiDb running on that hardware!

That annotated B+ Tree is a really neat innovation, certainly not something I'd stumbled onto before.

Thanks for explaining all that. I did a little highly unscientific testing. Bonsai with an empty database allocated about 2MB of ram (I used jemalloc to collect allocation stats) which should fit nicely in my systems' 64MB of ram.

It appears that memory usage scales in proportion to the database size, after a few hundred thousand inserts I had about 11MB of memory allocated. Should it stop growing memory usage at some point or is it linear with number of db entries (eg. Due to some in-memory BTree or something)?

I empirically observed the slowness you talked about in your blog post but I won't know how much that will affect me until I get some real hardware to test on - if I remember correctly the system runs at 400mhz so it should be many times slower then my laptop but that might still be fine for my very limited throughput (around 10 writes per second to record some sensor data).

ecton commented

Oops, I just added issue #263 to allow configuring the size of the internal cache. It currently will expand to 2,000 entries, each which can hold up to 160k each. Unless you're writing large payloads, you can assume the maximum cache size should end up being roughly 2,000 * average document size. Most of the other things that BonsaiDb keeps in memory are small and shouldn't grow based on the data size.

The new format I'm designing will have some increased memory usage to keep track of various on-disk state, but it should still be able to be used comfortably in a low-memory environment.

I'm very happy to hear that you were able to get it working! One other thing to note about speed is if you're testing on macOS: Currently BonsaiDb issues an fcntl(F_FULLFSYNC) because that's what Rust's File::sync_data() does under the hood. This is absolutely correct behavior for true ACID compliance, but it's known to be slow compared to fsync on Linux. This is made even worse by BonsaiDb currently requiring two fsyncs.

Syncing data is still slow on Linux, but it's markedly slower on macOS. Ultimately, ~10 writes per second should never have any problems with BonsaiDb, assuming the underlying fsync operation succeeds in a reasonable amount of time. I'm very hopeful you won't have any problems on real hardware.

Oops, I just added issue #263 to allow configuring the size of the internal cache. It currently will expand to 2,000 entries, each which can hold up to 160k each. Unless you're writing large payloads, you can assume the maximum cache size should end up being roughly 2,000 * average document size. Most of the other things that BonsaiDb keeps in memory are small and shouldn't grow based on the data size.

Interesting, I was about to change the hard-coded cache size locally but in re-running my test program to get a baseline I found that the memory usage had dropped back to 3MB.

It looks like my full dataset of 100s of thousands of entries is still there so it seems that the memory increase I was experiencing may be a memory leak related to large numbers of inserts (in the 100s of thousands). This is fine for my useage as I can just periodically restart my server (it would take almost a year to do as many inserts as my example did) but I thought it worth reporting anyways just as a heads-up.

Edit: Wait, I just realizing this may simply be due to having a cold cache. I will try loading 2000 entries and see if the problem returns.

Edit 2: I've added some cache warming that gets the first 2000 entries - the memory useage remains the same. So it appears my initial assessment that this is not a cache size problem may be correct.

ecton commented

Interesting. I am not aware of any memory leaks. On the main branch, I worked on inserting truly massive sets of data and did not notice memory leaks. If you end up having any further observations, please let me know!

Will do!