SQLsmith
<mba> "I love the smell of coredumps in the morning"
Description
SQLsmith is a random SQL query generator. Its paragon is Csmith, which proved valuable for quality assurance in C compilers.
It currently supports generating queries for PostgreSQL, SQLite 3 and MonetDB. To add support for another RDBMS, you need to implement two classes providing schema information about and connectivity to the device under test.
Besides developers of the RDBMS products, users developing extensions might also be interested in exposing their code to SQLsmith’s random workload.
During its prototyping stage, it already found about thirty bugs in PostgreSQL alphas, betas and releases, including security vulnerabilities in released versions. SQLsmith’s growing score list is maintained by its users in a wiki:
https://github.com/anse1/sqlsmith/wiki#score-list
Dependencies
- C++11
- libpqxx
optional:
- boost::regex in case your std::regex is broken
- SQLite3
- monetdb_mapi
Building on Debian Jessie
apt-get install build-essential autoconf autoconf-archive libpqxx-dev libboost-regex-dev libsqlite3-dev cd sqlsmith autoreconf -i # Not needed when building from a release tarball ./configure make
Building on OSX
In order to build on Mac OSX, assuming you use Homebrew, run the following
brew install libpqxx automake libtool autoconf autoconf-archive pkg-config cd sqlsmith autoreconf -i # Not needed when building from a release tarball ./configure make
Usage
SQLsmith connects to the target database to retrieve the schema for query generation and to send the generated queries to. Currently, all generated statements are rolled back. Beware that SQLsmith does call functions that could possibly have side-effects (e.g. pg_terminate_backend). Use a suitably underprivileged user for its connection to avoid this.
Example invocations:
# testing Postgres sqlsmith --verbose --target="host=/tmp port=65432 dbname=regression" # testing SQLite sqlsmith --verbose --sqlite="file:$HOME/.mozilla/firefox/places.sqlite?mode=ro" # testing MonetDB sqlsmith --verbose --monetdb="mapi:monetdb://localhost:50000/smith"
The following options are currently supported:
--target=connstr | target postgres database (default: libpq defaults) |
--sqlite=URI | target SQLite3 database |
--monetdb=URI | target MonetDB database |
--log-to=connstr | postgres db for logging errors into (default: don’t log) |
--verbose | emit progress output |
--version | show version information |
--seed=int | seed RNG with specified integer instead of PID |
--dry-run | print queries instead of executing them |
--max-queries=long | terminate after generating this many queries |
--exclude-catalog | don’t generate queries using catalog relations |
--dump-all-queries | dump queries as they are generated |
--dump-all-graphs | dump generated ASTs for debugging |
--rng-state=string | deserialize dumped rng state |
Sample output:
--verbose
makes sqlsmith emit some progress indication to stderr. A
symbol is output for each query sent to the server. Currently the
following ones are generated:
symbol | meaning | details |
---|---|---|
. | ok | Query generated and executed with ok sqlstate |
S | syntax error | These are bugs in sqlsmith - please report |
t | timeout | SQLsmith sets a statement timeout of 1s |
C | broken connection | These happen when a query crashes the server |
e | other error |
When you test against a RDBMS that doesn’t support some of SQLsmith’s grammar, there will be a burst of syntax errors on startup. These should disappear after some time as SQLsmith blacklists productions that consistently lead to errors.
--verbose
will also periodically emit error reports. In the
following example, these are mostly caused by the primitive type
system.
queries: 39000 (202.399 gen/s, 298.942 exec/s) AST stats (avg): height = 5.599 nodes = 37.8489 82 ERROR: invalid regular expression: quantifier operand invalid 70 ERROR: canceling statement due to statement timeout 44 ERROR: operator does not exist: point = point 27 ERROR: operator does not exist: xml = xml 22 ERROR: cannot compare arrays of different element types 11 ERROR: could not determine which collation to use for string comparison 5 ERROR: invalid regular expression: nfa has too many states 4 ERROR: cache lookup failed for index 2619 4 ERROR: invalid regular expression: brackets [] not balanced 3 ERROR: operator does not exist: polygon = polygon 2 ERROR: invalid regular expression: parentheses () not balanced 1 ERROR: invalid regular expression: invalid character range error rate: 0.00705128
The only one that looks interesting here is the cache lookup one. Taking a closer look at it reveals that it happens when you query a certain catalog view like this:
self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; FEHLER: cache lookup failed for index 2619
This is because the planner then puts pg_get_indexdef(oid)
in a
context where it sees non-index-oids, which causes it to croak:
QUERY PLAN ------------------------------------------------------------------------------------ Hash Join (cost=17.60..30.65 rows=9 width=4) Hash Cond: (i.oid = x.indexrelid) -> Seq Scan on pg_class i (cost=0.00..12.52 rows=114 width=8) Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char")) -> Hash (cost=17.31..17.31 rows=23 width=4) -> Hash Join (cost=12.52..17.31 rows=23 width=4) Hash Cond: (x.indrelid = c.oid) -> Seq Scan on pg_index x (cost=0.00..4.13 rows=113 width=8) -> Hash (cost=11.76..11.76 rows=61 width=8) -> Seq Scan on pg_class c (cost=0.00..11.76 rows=61 width=8) Filter: (relkind = ANY ('{r,m}'::"char"[]))
Now this is more of a curiosity than a bug, but still illustrating how debugging with the help of SQLsmith might look like.
Large-scale testing
--log-to
allows logging of hundreds of sqlsmith instances into a
central PostgreSQL database. ./log.sql contains the schema sqlsmith
expects and some additional views to generate reports on the logged
contents.
It also contains a trigger to filter boring/known errors based on the contents of the tables known and known_re. I periodically COPY my filter tables for testing PostgreSQL into the files ./known_re.txt and ./known.txt to serve as a starting point.
Resources
License
SQLsmith is available under GPLv3. Use it at your own risk. It may damage your database (one of the purposes of this tool is to try and break things). See the file COPYING for details.
Authors
Andreas Seltenreich <seltenreich@gmx.de>
Bo Tang <tangloner@gmail.com>
Sjoerd Mullender <sjoerd@acm.org>