diffix/reference

PoC property based testing for testing parity with reference

pdobacz opened this issue · 0 comments

Allow to verify a property: "all Diffix Elm implementations give same results on same query/same DB/same anon params".

Summarizing the slack thread, plan is to have FsCheck generate a bunch of SQL queries, which will later be executed against different implementations of DIffix, currently reference (as the reference) and pg_diffix. PoC means to each a minimally usable state quickly (a few days of work), cutting some corners here and there, and then see if we continue.

PoC plan in details:

  1. Fix some default anon parameters, but without any noise (comparing under noise requires some additional effort and plan)
  2. Fix some data set
  3. Dockerfile to build a pg_diffix image with the data set and all the required setup
  4. Generate test SQLs (either anonymizing or standard, depending which is easier), probably in the shape of: SELECT <random columns, simple expressions, simple aggregators> FROM <table> GROUP BY <random columns, simple expressions>
  5. Execute the SQL on QueryEngine.run to obtain the reference result
  6. Execute the SQL on pg_diffix via Npgsql.FSharp
  7. Normalize both results to some sensible form
  8. Compare results if both run without error. If one errors, the other one must error as well

Optional stretch goal would be to add:

  1. Additional simple SQL clauses like ORDER BY and LIMIT
  2. Some rudimentary comparison of errors, to make it hard for a suite consisting of trivial errors (like SQL syntax error) to accidentally pass the test
  3. Throwing in a second data set, as a drop-in replacement for the primary one.