apache/horaedb

support either nanosecond timestamp precision or composite primary keys

Opened this issue · 1 comments

Describe This Problem

my use case is trade data analytics for signal or strategy indicators. a trade has:

  • timestamp: the time (in nanosecond precision) of the execution of the trade
  • base: the tag/symbol of the commodity or security that was bought or sold
  • quote: the tag/symbol of the currency, commodity or security that the price of base is given in
  • exchange: the tag/symbol of the exchange where the trade was executed
  • id: the (uint64) id used by the exchange to uniquely identify the trade. due to the volume of trades, most exchanges use a composite key of (id, base, quote), so there are always collisions if base, quote, exchange and id are not considered together
  • price: the price (in quote) for 1 unit of base
  • size: the quantity of base in the trade. a positive size indicates a buy. a negative size indicates a sell

i am struggling to document a table definition in horaedb that does not lose trades. since there are many trades which occur within the same millisecond. nanosecond precision appears to be unavailable.

i have attempted to work around the millisecond precision limitation by using a composite primary key (time, exchange, quote, id). each base gets it's own table which allows for multi-exchange series and combining quotes where their underlying values are equivalent (ie: usdt == usdc). however it appears that horaedb accepts the definition but ignores any value in the composite key that is not the timestamp. this results in all but the first trade in a given millisecond being discarded as a duplicate.

Proposal

i would appreciate ideas about how to utilise nanosecond timestamp precision or how to correctly define a composite key that will actually work.

Additional Context

here's the table definition i have tried:

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

here's an example of source data where base is btc, quote is usd and exchange is coinbase:
https://api.exchange.coinbase.com/products/BTC-USD/trades?limit=10&after=1000

Hi, nanoseconds is not supported in horaedb now.

For your case, I think you could add another column to bypass this limit,

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    time2 uint64 NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, time2, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

Here we add another column time2 to primary keys, and it's defined as time(nanoseconds) % 1e6, so rows with same time value won't be overwritten.