PoC property based testing for testing parity with reference
pdobacz opened this issue · 0 comments
pdobacz commented
Allow to verify a property: "all Diffix Elm implementations give same results on same query/same DB/same anon params".
Summarizing the slack thread, plan is to have FsCheck
generate a bunch of SQL queries, which will later be executed against different implementations of DIffix, currently reference
(as the reference) and pg_diffix
. PoC means to each a minimally usable state quickly (a few days of work), cutting some corners here and there, and then see if we continue.
PoC plan in details:
- Fix some default anon parameters, but without any noise (comparing under noise requires some additional effort and plan)
- Fix some data set
Dockerfile
to build apg_diffix
image with the data set and all the required setup- Generate test SQLs (either anonymizing or standard, depending which is easier), probably in the shape of:
SELECT <random columns, simple expressions, simple aggregators> FROM <table> GROUP BY <random columns, simple expressions>
- Execute the SQL on
QueryEngine.run
to obtain thereference
result - Execute the SQL on
pg_diffix
via Npgsql.FSharp - Normalize both results to some sensible form
- Compare results if both run without error. If one errors, the other one must error as well
Optional stretch goal would be to add:
- Additional simple SQL clauses like
ORDER BY
andLIMIT
- Some rudimentary comparison of errors, to make it hard for a suite consisting of trivial errors (like SQL syntax error) to accidentally pass the test
- Throwing in a second data set, as a drop-in replacement for the primary one.