partiql/partiql-lang-kotlin

Redesign ExprValue to support alternative backends

vgapeyev opened this issue · 0 comments

ExprValue class is the implementation of the PartiQL data model. Most ExprValue subclasses constitute a "vanilla", or direct, implementation of the data model, while there are a couple subclasses that address consuming other data: CsvRowExprValue for CVS and IonStructExprValue for Ion. While there is some lazyness/on-demand in their implementations, data originating as Ion or CSV is translated into the vanilla data model before PartiQL queries are executed on it.

A few shortcomings of the current ExprValue design have become apparent lately:

  • An interaction with Ion data causes loss of annotations in it (Issue #1093). That's because annotations are not in the PartiQL data model and Ion data is fully translated into the vanilla implementation.
  • There are many more data formats (Avro came up recently) and access APIs (e.g. cursor-like APIs over relational data, such as JDBC) that could be seen through the lens of the PartiQL data model. However, it is not straightforward to follow the example of IonStructExprValue or CsvRowExprValue to accommodate them.
  • Even with successfully following the example, an implementation would suffer from duplicate data representation: native and vanilla PartiQL.
  • Subclassing as the language/architecture mechanism for connecting to an alternative data model leaves it unclear (in the outcome) where are the boundaries between the vanilla PartiQL data model ends and the alternative data models.

It should be possible to execute PartiQL queries directly on various "PartiQL-datamodel-like" data, by translating the queries into calls to the native APIs for the data.
A technical approach to this can be by redesigning ExprValue as an API that can have multiple implementations: one being the vanilla PartiQL data model, while others being wrappers over the native APIs for other kinds of data.

The pursued solution amounts to reimagining the PartiQL data model as a collection of data construction and deconstruction requirements, which could be fulfilled by alternative concrete data models -- possibly partially and possibly by data models with additional features. The concrete data models mentioned above should probably be included in proving this redesign:

  • The "vanilla" PartiQL in-memory data model, fully implementing exactly the features of the PartiQL data model, no more and now less.

  • In-memory Ion, backed by APIs of ion-java. Ion contains features (annotations, s-expressions) not present in PartiQL, that are expected to be preserved, in some well-defined ways, while querying with PartiQL. (In minor ways, it also lacks support for some of PartiQL features, such as bags and MISSING.)

  • CSV, which is a strict subset of the full PartiQL data model, in a significant way.

Technically, this will amount to re-designing ExprValue as an API (a collection of interfaces) and providing several implementations of it. The current ExprValue implementation can be seen as an amalgamation of at least 3 implementations (vanilla, Ion, and CSV-backed). These will be teased apart.

It is likely that the API will have to be structured to explicitly address value construction vs value deconstruction/exploration, with the overall architecture providing for a possibility of the two from different data models being used in conjunction. This could be essential, for example, for performing aggregation into collections of data coming from CSV.

It is also anticipated that ExprValue redesign will imply reconsidering the the very high-level APIs of constructing an evaluation environment (EvaluationSession) and evaluating a query within an environment.