This is a simple benchmark that tests how quick Postgres and MongoDB can concurrently read and write records to a table/collection that has at least 100M records in it.
This kind of benchmark is important to show which DB performs better as an event log store - think thermostat readings, or when a user (dis)connected to a chat room, and similar events that just get logged and never updated.
"But why not go for InfluxDB/TimescaleDB/Clickhouse?" I hear you ask. I don't want to add unnecessary complexity if I don't have to. Having a new DB that I have to learn to use and maintain is something I'd like to avoid given the size of my team. The only reason I'm looking into Mongo is because I have an expert in-house.
To run the benchmark execute
docker compose run -it --rm app ./benchmark
While you can run this locally, my suggestion is that you rent out a VPS and let the benchmark run there over night. This is very taxing on your SSD and renting a VPS gives you a consistent comparison platform.
There are many configuration options that you can pass to ./benchmark
. To see
them pass --help
. They allow you to configure the concurrency and size of the
read-write test, the base size of the database-under-test, define which tests
to perform against which database instances, and more.
Here are a few things I have learned while performing this benchmark. Watch out for them when running you own:
- The stock Postgres container is configured to use up to 1.5GB of RAM. I changed the config to allow up to 4GB - same as Mongo, though Mongo doesn't seem to benefit much from being capped or uncapped.
- The stock MongoDB configuration doesn't enable write-ahead logs/journaling. Without this enabled MongoDB will write data to disk once every minute. This means that any unexpected interrupt will lead to the last minute of data being lost. With journaling enabled MongoDB can recover from an unexpected stop. Journaling has a significant performance impact which is dependant on the number of clients connections to an instance.
- Postgres has a mode where it works similarly to MongoDB. If you disable
synchronous_commit
then data will be written to disk every 100ms instead of being written immediately. This improves write speed 5x to 10x which makes it comparable to MongoDB. - Running this benchmark on Apple Silicone Macs will skew the results. For historical reasons, Macs have different ways of instructing the OS to write data to disk from other Unix systems. TL;DR calling fsync on a Mac doesn't actually persist the data to disk, and this has been known for years but was recently under the spotlight. This means that if one database-under-test isn't aware of that and implements persistence to disk using different system calls from the other software-under-test this could cause it to perform better than it would on other hardware or operating systems. Therefore, a fairer comparison between different databases is to run this benchmark on a Linux machine, as it ensures that fsync behaves exactly the same way for all software.
The following are the results of this benchmark with the test cases defined in cases.yml. (reference the docker-compose.yml file to see the individual DB configs).
Test cases:
cluster_mongo
- tests how MongoDB performs with journaling enabled. I can't use a database that can lose 1min of data, or one without a hot-standby or read-replica.optimized_postgres
- tests how PG performs when configured to use the same amount of resources as MongoDB.mongo
- tests how the stock MongoDB config compares to the one with journaling enabled.async_postgres
- tests how fast PG write performance can go if 100ms of data loss is acceptable.
System:
AWS m7a.large
CPU: 2 vCPU
RAM: 8 GB RAM
SWAP: 5 GB
DISK: 200GB
Output:
Comparison:
Conclusions:
Please remember that all benchmarks are bullshit. What is important to me might not be important to you.
I'm interested in how these databases perform when there are 100M+ records in a table. If a DB is super performant with 1-5M records that isn't of much use to me since my dataset has 500M records currently and is growing.
I didn't test partitioning because I'd just go with Postgres or MariaDB then no matter how large the performance difference would be. As I said, having one DB to look at and maintain is much easier for me than having two.
- Postgres 11 vs Mongo 4 (Postgres wins in all aspects, though this is very out-of-date now)
- MongoDB docs: Synchronous writing and default disk flushing config
- MongoDb docs: Journaling
- Postgres' synchronous_commit command
- Difference in fsync semantics on MacOS and other Unix system
- Erlang issue, from 15 years ago, about the MacOS fsync semantics
- Hacker News discussion about the fsync semantics on MacOS and synchronous write performance in M1 Macs