typedb-osi/typedb-loader

Leverage workflow description language syntax

sitch opened this issue · 0 comments

sitch commented

Problem to Solve

Replace handwritten migrators with type-safe composable workflow syntax.

Current Workaround

Currently, we need to manually manage potentially batching/pooling and lots of string interpolation for mutations.

Proposed Solution

Write a command extension for OpenWDL - Spec 1.0 or adopt a similar sort of syntax

Given a schema.tql:

  define
  name sub attribute,
      value string;
  location sub entity,
      abstract,
      owns name @key,
      plays location-hierarchy:superior,
      plays location-hierarchy:subordinate;
  area sub location;
  city sub location;
  country sub location;
  location-hierarchy sub relation,
      relates superior,
      relates subordinate;

We create a workflow description:

import "schema.tql.wdl" # Codegen the AST traversal to spit out the typed struct mapping.

struct LocationBatch {
    Array[Location] locations
    Array[Area] areas
    Array[City] cities
    Array[Country] countries
    Array[LocationHierarchy] location_hierarchies
}

task cast_entity_relations {
  input {
    LocationBatch batch
  }
  
  command <<<
    typedb_loader <<CODE

    for {name, lat, long} <- %locations% {
      insert
      $location isa location,
        has? %name%,
        has %lat%,
        has %long;  
    }
  >>>  

  output {
    TypeDBBatch[Location] = read_batch_result(stdout())
  }
  
  meta {
    concurrency: 32,
    batch_size: 1000,
 }   

This would be a currently valid program description (given a typedb_loader extension). This could also be improved with better syntax surrounding the projection of source -> subgraph for insertion.

Additional Information

The current loader approach certainly works, but it's anything but readable/convenient. This type of solution has the added benefit of compossibility, so you can have a task for csv->subgraph and a task for subgraph->batch_upsert, etc...