Dashbase IDL definitions
For storing and transmitting structured data in a compact and language-indepedent way, we've created Avro schemas here
Currently we only have Avro support due to native-support by both the Hadoop and Kafka projects.
A DashbaseEvent defines a record to be inserted into Dashbase, and is composed from 3 parts:
Timestamp in milliseconds of creation of the event, defaults to 0 if not specified.
Dashbase columns define how the record is to be stored.
- meta columns - structured data, will not be tokenized and support aggregations, e.g. topn. Examples are: host, response code etc.
- number columns - contains numeric data, will be indexed as numbers and support numeric aggregations, e.g. sum/min/max/avg. Examples are: latency, byte count etc.
- text columns - unstructured text, will be tokenized and support full-text queries. Examples are: log messages, agents etc.
- id columns - optimized for optional id information, similar to meta, will not be tokenized. Aggreagtions are not supported.
Raw data and its storage can be configurated via:
- omitPayload - if true, raw data storage is skipped. Examples would be metrics or click data, where raw event bytes are typically not used, and storing them would be wasteful.
DashbaseEventBuilder should be used to build a DashbaseEvent instance.
Example:
DashbaseEventBuilder eventBuilder =
new DashbaseEventBuilder()
.withOmitPayload(false)
.withTimeInMillis(System.currentTimeMillis())
.addMeta("tags", "green")
.addNumber("num", 1234.0)
.addText("text", "dashbase is cool");
DashbaseEvent event = eventBuilder.build();