Make `DENSITY` keyword optional and parallel to `GIVEN`
Schaechtle opened this issue · 0 comments
Overview
Why are we doing this?
This is the second "sprint" towards IQL-permissive (this issue describes the first). It will allow us to solve simpler sub-problems first.
Technical approach
We want IQL-permissive to work by translating from permissive ASTs to strict ASTs. If necessary, IQL-permissive can consult the schema and data tables and knows which column-variable is modeled by which model. Runtime errors are 100% acceptable.
The current implementation of IQL includes two different probability expressions. PROBABILITY OF
expressions take distribution events and a model expression. PROBABILITY DENSITY OF
expressions take density events and a model expression. IQL-permissive doesn't require the DENSITY
keyword. PROBABILITY OF
expressions take (i) either a distribution event or a density event and (ii) a model expression.
We've reached a point where we take a keyword in IQL-permissive to mean something different in IQL-strict/current. That means we need to think if this change merits a split (e.g. running two different grammars).
Examples
For the sake of readability, I am translating model expressions from query segments in permissive to query segments in strict and not ASTs to ASTs. We assume the following environment:
m
is a modeld
is a data tablefoo
,bar
andbaz
are columns ind
and also column variables inm
.- the schema records that
foo
,bar
are numerical, whilebaz
is nominal.
The example model expressions below should be viewed as part of complete PROBABILITY OF
queries i.e. queries like SELECT [probability expression] UNDER m FROM d
Example model expressions
Below, ➡️ means "translate AST of sub-query in IQL-permissive to IQL-strict", for example:
PROBABILITY OF VAR foo = foo
➡️ PROBABILITY DENSITY OF VAR foo = foo
Note: when an event could be either a distribution event or a density event, then it should be treated as a density event, as these can be more efficiently handled by the backend.
PROBABILITY OF VAR baz = baz
➡️ PROBABILITY DENSITY OF VAR baz = baz
PROBABILITY OF
can take ,
. ,
always means AND
. While this may seem redundant, having both is necessary to ensure certain permissive queries to read clearly to outside users with little background in either programming or probability:
PROBABILITY OF VAR foo = foo, VAR baz = baz
➡️ PROBABILITY DENSITY OF VAR foo = foo AND VAR baz = baz
With events where =
is the only operator, mixing nominal and numerical variables is trivial:
PROBABILITY OF VAR foo = foo, VAR bar = bar
➡️ PROBABILITY DENSITY OF VAR foo = foo AND VAR bar = bar
Distribution events as inputs in IQL-permissive PROBABILITY OF
are always translated to PROBABILITY OF
in IQL-strict:
PROBABILITY OF VAR foo > foo, VAR bar = bar
➡️ PROBABILITY OF VAR foo > foo AND VAR bar = bar
The following will result in a runtime error in IQL-permissive:
PROBABILITY OF VAR foo > foo AND VAR bar = bar
➡️ PROBABILITY OF VAR foo > foo AND VAR bar = bar
💥ERROR💥
OR
is supported, too:
PROBABILITY OF VAR foo > foo OR VAR baz = baz
➡️ PROBABILITY OF VAR foo > foo OR VAR baz = baz
Non-goals
The following features for IQL-permissive will be tackled during later sprints:
- Removing the
VAR
keyword. - Translating
GIVEN foo
intoGIVEN VAR foo=foo
. - Changing the order of
GIVEN
- i.e. the ability to writePROBABILITY OF foo GIVEN bar UNDER model
instead ofPROBABILITY OF foo UNDER model GIVEN bar
- Nesting of
GIVEN
is not strictly required.
Other non-goals for now (which might become important later)
- IQL-permissive does not need to ensure useful error messages are thrown.
- We'll assume one schema (i.e. a single mapping from column to stattype). In the future, different models may support different schemas.
Open issues
I (Ulli) should create a complete spec for IQL permissive that Zane can work off of, that can be extended to issues like this.