Mongo VS Postgres - Large-table performance comparison

This is a simple benchmark that tests how quick Postgres and MongoDB can concurrently read and write records to a table/collection that has at least 100M records in it.

This kind of benchmark is important to show which DB performs better as an event log store - think thermostat readings, or when a user (dis)connected to a chat room, and similar events that just get logged and never updated.

"But why not go for InfluxDB/TimescaleDB/Clickhouse?" I hear you ask. I don't want to add unnecessary complexity if I don't have to. Having a new DB that I have to learn to use and maintain is something I'd like to avoid given the size of my team. The only reason I'm looking into Mongo is because I have an expert in-house.

Running

To run the benchmark execute

docker compose run -it --rm app ./benchmark

While you can run this locally, my suggestion is that you rent out a VPS and let the benchmark run there over night. This is very taxing on your SSD and renting a VPS gives you a consistent comparison platform.

There are many configuration options that you can pass to ./benchmark. To see them pass --help. They allow you to configure the concurrency and size of the read-write test, the base size of the database-under-test, define which tests to perform against which database instances, and more.

Notes

Here are a few things I have learned while performing this benchmark. Watch out for them when running you own:

The stock Postgres container is configured to use up to 1.5GB of RAM. I changed the config to allow up to 4GB - same as Mongo, though Mongo doesn't seem to benefit much from being capped or uncapped.
The stock MongoDB configuration doesn't enable write-ahead logs/journaling. Without this enabled MongoDB will write data to disk once every minute. This means that any unexpected interrupt will lead to the last minute of data being lost. With journaling enabled MongoDB can recover from an unexpected stop. Journaling has a significant performance impact which is dependant on the number of clients connections to an instance.
Postgres has a mode where it works similarly to MongoDB. If you disable synchronous_commit then data will be written to disk every 100ms instead of being written immediately. This improves write speed 5x to 10x which makes it comparable to MongoDB.
Running this benchmark on Apple Silicone Macs will skew the results. For historical reasons, Macs have different ways of instructing the OS to write data to disk from other Unix systems. TL;DR calling fsync on a Mac doesn't actually persist the data to disk, and this has been known for years but was recently under the spotlight. This means that if one database-under-test isn't aware of that and implements persistence to disk using different system calls from the other software-under-test this could cause it to perform better than it would on other hardware or operating systems. Therefore, a fairer comparison between different databases is to run this benchmark on a Linux machine, as it ensures that fsync behaves exactly the same way for all software.

Result

The following are the results of this benchmark with the test cases defined in cases.yml. (reference the docker-compose.yml file to see the individual DB configs).

Test cases:

cluster_mongo - tests how MongoDB performs with journaling enabled. I can't use a database that can lose 1min of data, or one without a hot-standby or read-replica.
optimized_postgres - tests how PG performs when configured to use the same amount of resources as MongoDB.
mongo - tests how the stock MongoDB config compares to the one with journaling enabled.
async_postgres - tests how fast PG write performance can go if 100ms of data loss is acceptable.

System:

AWS m7a.large
CPU: 2 vCPU
RAM: 8 GB RAM
SWAP: 5 GB
DISK: 200GB

Output:

Comparison:

Conclusions:

Disclaimer

Please remember that all benchmarks are bullshit. What is important to me might not be important to you.

I'm interested in how these databases perform when there are 100M+ records in a table. If a DB is super performant with 1-5M records that isn't of much use to me since my dataset has 500M records currently and is growing.

I didn't test partitioning because I'd just go with Postgres or MariaDB then no matter how large the performance difference would be. As I said, having one DB to look at and maintain is much easier for me than having two.

monorkin/mongo-vs-pg-large-table-performance

Mongo VS Postgres - Large-table performance comparison

Running

Notes

Result

Disclaimer

Other people's benchmarks & other relevant link