Documentation party

Question

Closed this issue 2 years ago · 1 comments

What is Icedb?
Who uses it
How to use it
Arch
- How the log works (data formats)
- How the data parts work
- How merging works
- How tombstone cleaning works
Why icedb, what makes it different
- Why not bigquery
- Why not Athena
- Why not spark/emr
- Why not clickhouse/timescale/redshift/etc
Performance
Cost comparison to bigquery for the same dataset and queries
When to merge and tombstone clean
Parameters
Tips and tricks
- Merge and tombstone coordination for multiple ingestion nodes of the same table
- Large batch inserts
- Schema validation before insert (need to make sure is consistent, easiest if single host manages a table exclusively), see tips in #85 for detecting changes to schema in cache and checking against serializable tx
- Pair with RedPanda for ingest works well
- Self-batching like in https://github.com/danthegoodman1/icedb/blob/main/examples/api-full.py

Answer 1 · 2023-08-12T18:02:17.000Z

Performance testing, compare to other solutions like BigQuery, Athena, ClickHouse, Spark/EMR