danthegoodman1/icedb

Documentation party

Closed this issue · 1 comments

  • What is Icedb?
  • Who uses it
  • How to use it
  • Arch
    • How the log works (data formats)
    • How the data parts work
    • How merging works
    • How tombstone cleaning works
  • Why icedb, what makes it different
    • Why not bigquery
    • Why not Athena
    • Why not spark/emr
    • Why not clickhouse/timescale/redshift/etc
  • Performance
  • Cost comparison to bigquery for the same dataset and queries
  • When to merge and tombstone clean
  • Parameters
  • Tips and tricks
    • Merge and tombstone coordination for multiple ingestion nodes of the same table
    • Large batch inserts
    • Schema validation before insert (need to make sure is consistent, easiest if single host manages a table exclusively), see tips in #85 for detecting changes to schema in cache and checking against serializable tx
    • Pair with RedPanda for ingest works well
    • Self-batching like in https://github.com/danthegoodman1/icedb/blob/main/examples/api-full.py

Performance testing, compare to other solutions like BigQuery, Athena, ClickHouse, Spark/EMR