Engineering plan for restructuring software architecture of bayeslite
fsaad opened this issue · 2 comments
Internal software and user facing language will be undergoing a dramatic
makeover.
The changes outlined in this ticket:
- will be completely backwards incompatible
- will render any existing bdb unusable
- will break all existing clients of bayeslite
- will render any existing BQL/MML documentation out of date
Removing 'metamodel' concept in favor of 'backend'
- Rename interface of IBayesDBMetamodel to IBayesDB_Backend
- Expunge all instances of "metamodel" in the source code.
- Expunge metamodels.crosscat.CrosscatMetamodel
- Remove all tests which target the internals of CrosscatMetamodel.
- Replace all bql compiler tests using CrosscatMetamodel to use
CGPM_Metamodel.
Merge tables, populations, generators, into one concept
- Remove CREATE POPULATION p FOR t
- Add CREATE POPULATION p FROM 'path'
- Creates sqlite table p which is the base table of bayesdb population p.
- Add CREATE POPULATION p FROM "t"
- Creates sqlite table p (as a clone of t) which is base table of bayesdb
population p.
- Creates sqlite table p (as a clone of t) which is base table of bayesdb
- Remove GENERATOR g FOR p.
- Merge notions generator_id in favor of population_id given the 1-1
correspondence.- Replace generator_id with population_id in IBayesDB_Backend.
- Replace all generator methods in core to population.
- Remove all unused generator methods.
- Remove generator_id from all methods which take in a population_id.
- Move stattype guessing/management from bayeslite into each backend.
- bayeslite provides a table to each backend to store its stattypes.
- create_generator (to be create_population) in runtime should place
entries said table.
- Remove aggregation over metamodels bqlfn.py.
- Remove MODELED BY from BQL grammar.
Updates to compiler and bqlfn
- Remove all generator_id parameters from bqlfn
- Remove all generator_id arguments written in compiler.
Clear up several items in the main schema.py
- Merge bayesdb_generator_column with bayesdb_population_column (#443)
- Merge bayesdb_generator with bayesdb_population
- Remove bayesdb_generator_column
- Rename bayesdb_generator_model to bayesdb_population_model
- Remove bayesdb_session
- Remove bayesdb_session_entries
- Remove bayesdb_stattype
- Simplify bayesdb_variable to exclude generator_id
Simplify population schema definition.
- GUESS STATTYPES OF (*);
- SET STATTYPES OF to
- Remove CATEGORICAL in favor of NOMINAL.
- Rationale: An ORDINAL variable is conceivable CATEGORICAL, where there is an
ordering among the discrete categories.
- Rationale: An ORDINAL variable is conceivable CATEGORICAL, where there is an
Simplify half-implemented or confusing features
- Remove codebook.py
- Merge "similarity" with "predictive relevance".
- Remove the "multi-row" aspect; effect can be achieved using SQL.
- Raise error if more than one matching row exists.
- Allow hypothetical rows for a "similarity" query.
- Remove WAIT from BQL ANALYZE.
- Remove checkpoint by seconds?
- wontfix.
- Remove ANALYSIS SCHEMA / ANALSYES.
- Remove WITH BASELINE syntax (created for ephemeral purposes).
Ad-hoc issues
- cgpm_metamodel.analyze needs to capture None iterations/seconds and
return. - cgpm_metamodel._merge_user_constraints needs to capture None case.
- cgpm_metamodel alter requires dependent for all columns only (easy to fix
now, post panelcat).- Issue in test_bql.test_conditional_probability, but was easily handled.
- We should still support the more general dependency case now that we have
fixed column transitions to be ergodic in lovcat, created ticket #592.
- cgpm_metamodel returns 'TypeError: expected string or buffer' when there
are no initialized models. Came up in test_core.test_bql for "analyze pe_cc
for 1 iteration wait". - cgpm_metamodel checkpoint seconds raises NotImplementedError instead of
BQLError. - cgpm_metamodel does not store iterations in bayesdb_generator_model.
test_bql.test_checkpoint__ci_slow.- Removed the iterations column from bayesdb_generator_model; tracking
model analysis statistics is a backend-specific operation not the concern
of anything in schema.py.
- Removed the iterations column from bayesdb_generator_model; tracking
- cgpm_metamodel returns an error if trying to simulate a constrained
column; appeared in test_simulate.test_simulate_drawconstraint_error.- Made 1 test case be an error, another test case be a valid simulation.
- cgpm_metamodel accepts constraints for unseen nominals. This came up in
test_codebook. Changed to make sure densities agree. - engine caching mechanism is not compatible with sqlite rollbacks. Issue
discovered in test_metamodels. See CGPM_Metamodel._engine_latest for
discussion and workaround. - Move src/backends/crosscat_theta.schema.json to probcomp/crosscat.
- Move notes/crosscat-schema.txt to probcomp/crosscat.
Related tickets that will be resolved:
#284
#388
#427
#441
#443
#467
#469
#562
Above merge commit contains a partial resolution of this ticket, clearing out roughly 50% of the work items. This commit is considered stable, and all unchecked issues in the list above are considered on hold until further notice.
The unchecked items under heading "Merge tables, populations, generators, into one concept" have been mostly addressed by a user-facing workaround per #596, which keeps the full generality of the existing system while allowing "implicit" populations/generators that inherit names from their bases tables/populations.