Replace `CONSTRAINED BY` and `CONDITIONED BY` with `GIVEN`
Closed this issue · 1 comments
Overview
Why are we doing this?
This is the first "sprint" towards IQL-permissive. It will allow us to solve simpler sub-problems first.
Technical approach
We want IQL-permissive to work by translating from permissive ASTs to strict ASTs. To do that, IQL permissive will not be context free. It can consult the schema and data tables and knows which column-variable is modeled by which model.
This is necessary for this PR, too.
Sometimes, GIVEN
statements will have to be translated into strict ASTs that encode a model expression with both CONSTRAINED BY
and CONDITIONED BY
. Another way to think about this is that CONDITIONED BY
takes density events, CONSTRAINED BY
takes distribution events but GIVEN
takes both and events that are conjunctions (i.e. AND
-linked lists) of both types of events.
Examples
For the sake of readability, I am translating model expressions from query segments in permissive to query segments in strict and not ASTs to ASTs. We assume the following environment:
m
is a modeld
is data tablefoo
,bar
andbaz
are a columns ind
and als column variables inm
.- the schema records that
foo
,bar
are numerical, whilebaz
is nominal.
The example model expressions below should be viewed as part of PROBABILITY OF
queries i.e. queries like SELECT PROBABILITY OF VAR foo = foo UNDER [model expression] FROM data
. But the logic applies equally to GENERATE
.
Example model expressions
Below, ➡️ means "translate AST of sub-query in IQL-permissive to IQL-strict", for example:
UNDER m GIVEN VAR foo = foo
➡️ under m CONDITIONED BY VAR foo = foo
Note: when an event could be either a distribution event or a density event, then it should be treated as a density event, as these can be more efficiently handled by the backend.
UNDER m GIVEN VAR baz = baz
➡️ under m CONDITIONED BY VAR baz = baz
GIVEN
can take ,
. ,
always means AND
. While this may seem redundant, having both is necessary to ensure certain permissive queries to read clearly to outside users with little background in either programming or probability:
UNDER m GIVEN VAR foo = foo, VAR bar = bar
➡️ under m CONDITIONED BY VAR foo = foo AND VAR bar = bar
With pure density events, mixing nominal and numerical variables is trivial:
UNDER m GIVEN VAR foo = foo, VAR baz = baz
➡️ under m CONDITIONED BY VAR foo = foo AND VAR baz = baz
AND
also works with GIVEN
:
UNDER m GIVEN VAR foo = foo AND VAR bar = bar
➡️ under m CONDITIONED BY VAR foo = foo AND VAR bar = bar
Distribution events in GIVEN
expressions are translated to CONSTRAINED BY
expressions:
UNDER m GIVEN VAR foo > foo
➡️ under m CONSTRAINED BY VAR foo > foo
This events can be mix-and-matched, but my require chaining of CONSTRAINED BY
and CONDITIONED BY
:
UNDER m GIVEN VAR foo > foo AND VAR bar = bar
➡️ (under m CONDITIONED BY VAR bar = bar) CONSTRAINED BY VAR foo > foo
It's prefered to not do that, though:
UNDER m GIVEN VAR foo > foo AND VAR baz = baz
➡️ under m CONSTRAINED BY VAR foo > foo AND VAR baz = baz
This last model expression could also be translated into an equivalent expression chaining CONSTRAINED BY
and CONDITIONED BY
.
Non-goals
The following features for IQL-permissive will be tackled during later sprints:
- Removing the
VAR
keyword. - Translating
GIVEN foo
intoGIVEN VAR foo=foo
. - Making the
DENSITY
optional (this is related to this PR though; because of the parallelisms:PROBABILITY
andCONSTRAINED BY
are both taking a distribution event andPROBABILITY DENSITY
andCONDITIONED BY
both taking a density event. - Changing the order of
GIVEN
- i.e. the ability to writePROBABILITY OF foo GIVEN bar UNDER model
instead ofPROBABILITY OF foo UNDER model GIVEN bar
- Nesting of
GIVEN
is not strictly required.
Other non-goals for now (which might become important later)
- IQL-permissive does not need to ensure useful error messages are thrown.
- We'll assume one schema (i.e. a single mapping from column to stattype). In the future, different models may support different schemas.
Open issues
I (Ulli) should create a complete spec for IQL permissive that Zane can work off of, that can be extended to issues like this.
Reminder for me: this needs to talk about OR
!