Leverage workflow description language syntax
sitch opened this issue · 0 comments
Problem to Solve
Replace handwritten migrators with type-safe composable workflow syntax.
Current Workaround
Currently, we need to manually manage potentially batching/pooling and lots of string interpolation for mutations.
Proposed Solution
Write a command extension for OpenWDL - Spec 1.0 or adopt a similar sort of syntax
Given a schema.tql
:
define
name sub attribute,
value string;
location sub entity,
abstract,
owns name @key,
plays location-hierarchy:superior,
plays location-hierarchy:subordinate;
area sub location;
city sub location;
country sub location;
location-hierarchy sub relation,
relates superior,
relates subordinate;
We create a workflow description:
import "schema.tql.wdl" # Codegen the AST traversal to spit out the typed struct mapping.
struct LocationBatch {
Array[Location] locations
Array[Area] areas
Array[City] cities
Array[Country] countries
Array[LocationHierarchy] location_hierarchies
}
task cast_entity_relations {
input {
LocationBatch batch
}
command <<<
typedb_loader <<CODE
for {name, lat, long} <- %locations% {
insert
$location isa location,
has? %name%,
has %lat%,
has %long;
}
>>>
output {
TypeDBBatch[Location] = read_batch_result(stdout())
}
meta {
concurrency: 32,
batch_size: 1000,
}
This would be a currently valid program description (given a typedb_loader
extension). This could also be improved with better syntax surrounding the projection of source
-> subgraph
for insertion.
Additional Information
The current loader approach certainly works, but it's anything but readable/convenient. This type of solution has the added benefit of compossibility, so you can have a task for csv->subgraph
and a task for subgraph->batch_upsert
, etc...