probcomp/bayeslite

Add support for unnamed ``implicit'' populations and generators

fsaad opened this issue · 1 comments

fsaad commented

As a midway point between (i) having a table, population, and generator be identified by a single name to reduce the number of names that an end user needs to know about (in the most common case), and (ii) keeping the full generality of multi-population tables and multi-generator populations that is implemented throughout bayeslite (see #588 for full details), I propose implementing ``implicit'' populations and generators, which are created in the following way:

CREATE TABLE satellites FROM 'satellites.csv';
CREATE POPULATION FOR satellites (GUESS STATTYPES OF (*));
CREATE GENERATOR FOR satellites USING loom (<modeling>);

In this workflow, the user only specified the name ''satellites'' of the base table, and created a population and generator without creating new names. Therefore, the user need not manage multiple names such as ''satellites'', ''satellites_p'', and ''satellites_cc''.

  • grammar.y: Add support for unnamed populations/generators.
  • schema.py: Add implicit column to the bayesdb_population and bayesdb_generator tables.
  • schema.py: Add sqlite triggers enforcing invariants for implicit populations/generators in schema
    • Names must be identical.
    • Table with implicit population can only have one and only one population.
    • Population with implicit generator can have one and only one generator.
  • bql.py: Update interpreter to handle unnamed populations/generators, marking them as implicit.
  • bql.py: Update interpreter to handle renaming tables/populations with implicit populations/generators, by propagating the renaming downwards (blocked on #598).
  • core.py: Add some utitlies for implicit generators/populations, such as checking whether a table/population has an implicit population/generator, respectively.
  • test_core.py: Add tests for this whole shebang, including creating, triggers, dropping, duplicates, etc.
  • doc/bql.rst: Document using implicit populations/generators in create/rename modeling commands.