dbt-labs/dbt-core

MVP: Remotely Callable Tasks

cmcarthur opened this issue · 1 comments

Background: #1141

In the Alternative Entrypoints issue, we discussed two fundamental changes to how users are able to interact with dbt:

  • Give dbt a way to load a manifest file from disk and deserialize it
  • Give dbt a way to use that manifest to take small, specific actions, e.g.
    • compile a SQL string
    • run a SQL query against the warehouse

This issue expands on the second of the two, giving dbt a way to use a manifest to take small, specific actions, e.g. compile and run arbitrary SQL queries.

We plan to accomplish the first pass at this by implementing new tasks and new task types that allow dbt to operate as a JSON-RPC server. This is very much intended to be an MVP, and we'll plan to expand the breadth and depth of interactivity rapidly to enable more use cases.

There are four tasks / task types to be implemented:

  • ServerTask: On startup, performs a full dbt compile, then operates as a JSON-RPC server for handling interactive requests.
  • RemoteCallableTask (abstract): Designates a dbt Task as being remotely callable. Properties to be implemented by subclasses include: method name, whether it is callable synchronously and/or asynchronously. (We can punt on async for the immediate future since this issue only requires sync calls.) Properties to be implemented by this class include: standard exception handler, logging of request/response cycle. It should also register each task as a unique method in the JSON-RPC server.
  • CompileSQLTask: see below
  • RunSQLTask: see below

CompileSQLTask

The CompileSQLTask takes a base64-encoded Jinja SQL string as an argument, and spits out a compiled version of that SQL string. Extends RemoteCallableTask. Synchronous only.

kwargs:

  • base64-encoded jinja sql
  • timeout_seconds: A limit in seconds to put on the compilation. None means no timeout, and I imagine that is a reasonable place to start.

returns:

{
  "id": "<uuid>",
  "result": "Success.",
  "data": {
    "raw_sql": "...",
    "compiled_sql": "...",
    "timing": [
      {
        "type": "compilation",
        "started_at": "...",
        "finished_at": "..."
      }
    ]
  }
}

RunSQLTask

The RunSQLTask does the same stuff as CompileSQLTask, but in addition it actually runs the compiled SQL and returns the query results in tabular format. Extends CompileSQLTask. Sync only.

kwargs:

  • base64-encoded jinja sql
  • timeout???

returns:

{
  "id": "<uuid>",
  "result": "Success.",
  "data": {
    "raw_sql": "...",
    "compiled_sql": "...",
    "timing": [
      {
        "type": "compilation",
        "started_at": "...",
        "finished_at": "..."
      },
      {
        "type": "execute",
        "started_at": "...",
        "finished_at": "..."
      }
    ],
    "table": ... tabular format ...
  }
}

Implementation Notes

Manifests

We'll need the ability to take a Real Manifest and a manifest partial representing a single fake "node" and compile only the single fake "node". We should not have to implement any fancy methods of combining multiple manifests into one since the fake "node" should never overlap with a real node in the real manifest. THIS MEANS THAT COMPILING CUSTOM MACROS WILL NOT BE SUPPORTED BY THIS VERSION. But, that's ok for right now. We can solve the technical challenges involved with incorporating these partial manifests later on.

dbt's JSON-RPC spec

To start, dbt should use the minimal JSON-RPC spec, and lean on its JSON schemas to provide contracts for its responses. But, whenever possible, we should use the data field in the response to provide meaningful data in the response body, so that we have room to expand the set of required fields later on.

Tasks

Tasks currently take all of their inputs via configs. This is OK, but for this functionality to be maximally useful it would be better if they accepted a structured set of kwargs either at runtime or instantiation time. e.g. you could create a RunTask with a dynamic selection syntax.

Resolved in #1301