apibara/dna

Improve sink/integration developer experience

fracek opened this issue · 3 comments

fracek commented

Is your feature request related to a problem? Please describe.
At the moment, the developer experience of using sinks/integrations is not ideal:

  • data filter is a json file, so we need external code generation to generate filters
  • the javascript script is only used for transformation
  • options are set using cli arguments or environment variables

Describe the solution you'd like
We should unify everything to improve the developer experience. We do that by using the javascript file for both configuration and transformation.

import { Configuration, StarknetFilter, StarknetBlock, PostgresSink } from '@apibara/integration'

export const config: Configuration<StarknetFilter, PostgresSink> = {
  type: 'starknet',
  stream: {
    url: 'https://mainnet.starknet.a5a.ch',
    bearerToken: Deno.env.get('DNA_TOKEN'),
    // other options
  },
  startingCursor: 123_456,
  filter: Filter().withHeader({ weak: false }).toObject(),
  sink: {
    type: 'postgres',
    options: {
      connectionUrl: 'postgres://....',
    }
  }
}

export default function transform(batch: StarknetBlock[]) {
  // do something with data
}

Notice that the configuration needs to be generic over the filter and sink types.

One of the challenges is that for the hosted service we want users to connect to the streams using the internal network (to avoid paying for egress charges), so we cannot let them freely select the stream url and token and instead we want to override that config.

We achieve this by having the following priority for the configuration (higher is better).

  1. defaults
  2. config from script
  3. environment variables
  4. command line arguments

This way, users can use any value in the script for testing and when they deploy to the hosted service we overrides the problematic values.

We will provide a new apibara cli tool that is the entrypoint for running Apibara scripts. For example:

  • apibara run script.ts: runs the script
  • apibara run script.ts --stream.bearer-token=xxx: overrides the stream bearer token

We want to keep the sink abstraction extensible to encourage developers to build their own to integrate with their favourite tools. We do that by delegating the execution of the script to another tool based on the value of sink.type.

The execution trace of apibara run is as follows:

  • reads and validates script.
  • gets value of sink.type.
  • forwards script and cli flags to apibara-sink-<sink.type> (e.g. apibara-sink-postgres) (the executable is expected to be in $PATH).

In the future, we can replace the third step with a more sophisticated approach where the sink and the runner communicate through a grpc service, but for now it adds complexity for no clear benefit.

By convention, sink options can be overriden as follows:

  • cli --<sink.type>.<option-name> (e.g. --postgres.connection-url)
  • env var <SINK_TYPE>_<OPTION_NAME> (e.g. POSTGRES_CONNECTION_URL)

Configuration through env variables is important for production since we can't hard-code secrets in the script.

Additional context
The configuration approach is similar to Grafana K6, the multi-binary approach is similar to Pulumi.

  • How do you want to read the JS/TS file ? Deno ?
  • I suggest we avoid nesting, I'd prefer the config to look like
export const config: Configuration<StarknetFilter, PostgresSink> = {
  type: 'starknet',
  streamUrl: 'https://mainnet.starknet.a5a.ch',
  bearerToken: Deno.env.get('DNA_TOKEN'),
  // other options
  startingCursor: 123_456,
  filter: Filter().withHeader({ weak: false }).toObject(),
  sinkType: 'postgres',
  sinkOptions: {
      connectionUrl: 'postgres://....',
    }
  }
}
  • --stream.bearer-token=xxx this looks not so standard to me, I'd prefer it to be just --bearer-token or --stream-bearer-token
  • Why do we need to prefix the sink options with --<sink.type> ? it looks cumbersome to me
fracek commented
  • Yes, the javascript runtime is Deno. The ts -> js transpilation is provided by a crate that leverages swc
  • I agree it looks better without nesting.
  • The dot in the name is used by some tool (grafana etc, geth) to mimic the nesting in the config. If we remove nesting it's not needed anymore.
  • The prefix is needed to avoid clashing when using env variables. So let's say I need to use a bearer token to authenticate with my db, then it's in the DB_BEARER_TOKEN env variable and doesn't clash with the stream bearer token.
fracek commented

This was done in #188