/sqlsmith

A random SQL query generator

Primary LanguageC++GNU General Public License v3.0GPL-3.0

logo.png

SQLsmith

<mba> "I love the smell of coredumps in the morning"

Description

SQLsmith is a random SQL query generator. Its paragon is Csmith, which proved valuable for quality assurance in C compilers.

It currently supports generating queries for PostgreSQL, SQLite 3 and MonetDB. To add support for another RDBMS, you need to implement two classes providing schema information about and connectivity to the device under test.

Besides developers of the RDBMS products, users developing extensions might also be interested in exposing their code to SQLsmith’s random workload.

During its prototyping stage, it already found about thirty bugs in PostgreSQL alphas, betas and releases, including security vulnerabilities in released versions. SQLsmith’s growing score list is maintained by its users in a wiki:

https://github.com/anse1/sqlsmith/wiki#score-list

Dependencies

  • C++11
  • libpqxx

optional:

  • boost::regex in case your std::regex is broken
  • SQLite3
  • monetdb_mapi

Building on Debian Jessie

apt-get install build-essential autoconf autoconf-archive libpqxx-dev libboost-regex-dev libsqlite3-dev
cd sqlsmith
autoreconf -i # Not needed when building from a release tarball
./configure
make

Building on OSX

In order to build on Mac OSX, assuming you use Homebrew, run the following

brew install libpqxx automake libtool autoconf autoconf-archive pkg-config
cd sqlsmith
autoreconf -i # Not needed when building from a release tarball
./configure
make

Usage

SQLsmith connects to the target database to retrieve the schema for query generation and to send the generated queries to. Currently, all generated statements are rolled back. Beware that SQLsmith does call functions that could possibly have side-effects (e.g. pg_terminate_backend). Use a suitably underprivileged user for its connection to avoid this.

Example invocations:

# testing Postgres
sqlsmith --verbose --target="host=/tmp port=65432 dbname=regression"
# testing SQLite
sqlsmith --verbose --sqlite="file:$HOME/.mozilla/firefox/places.sqlite?mode=ro"
# testing MonetDB
sqlsmith --verbose --monetdb="mapi:monetdb://localhost:50000/smith"

The following options are currently supported:

--target=connstrtarget postgres database (default: libpq defaults)
--sqlite=URItarget SQLite3 database
--monetdb=URItarget MonetDB database
--log-to=connstrpostgres db for logging errors into (default: don’t log)
--verboseemit progress output
--versionshow version information
--seed=intseed RNG with specified integer instead of PID
--dry-runprint queries instead of executing them
--max-queries=longterminate after generating this many queries
--exclude-catalogdon’t generate queries using catalog relations
--dump-all-queriesdump queries as they are generated
--dump-all-graphsdump generated ASTs for debugging
--rng-state=stringdeserialize dumped rng state

Sample output:

--verbose makes sqlsmith emit some progress indication to stderr. A symbol is output for each query sent to the server. Currently the following ones are generated:

symbolmeaningdetails
.okQuery generated and executed with ok sqlstate
Ssyntax errorThese are bugs in sqlsmith - please report
ttimeoutSQLsmith sets a statement timeout of 1s
Cbroken connectionThese happen when a query crashes the server
eother error

When you test against a RDBMS that doesn’t support some of SQLsmith’s grammar, there will be a burst of syntax errors on startup. These should disappear after some time as SQLsmith blacklists productions that consistently lead to errors.

--verbose will also periodically emit error reports. In the following example, these are mostly caused by the primitive type system.

queries: 39000 (202.399 gen/s, 298.942 exec/s)
AST stats (avg): height = 5.599 nodes = 37.8489
82	ERROR:  invalid regular expression: quantifier operand invalid
70	ERROR:  canceling statement due to statement timeout
44	ERROR:  operator does not exist: point = point
27	ERROR:  operator does not exist: xml = xml
22	ERROR:  cannot compare arrays of different element types
11	ERROR:  could not determine which collation to use for string comparison
5	ERROR:  invalid regular expression: nfa has too many states
4	ERROR:  cache lookup failed for index 2619
4	ERROR:  invalid regular expression: brackets [] not balanced
3	ERROR:  operator does not exist: polygon = polygon
2	ERROR:  invalid regular expression: parentheses () not balanced
1	ERROR:  invalid regular expression: invalid character range
error rate: 0.00705128

The only one that looks interesting here is the cache lookup one. Taking a closer look at it reveals that it happens when you query a certain catalog view like this:

self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
FEHLER:  cache lookup failed for index 2619

This is because the planner then puts pg_get_indexdef(oid) in a context where it sees non-index-oids, which causes it to croak:

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Hash Join  (cost=17.60..30.65 rows=9 width=4)
   Hash Cond: (i.oid = x.indexrelid)
   ->  Seq Scan on pg_class i  (cost=0.00..12.52 rows=114 width=8)
         Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char"))
   ->  Hash  (cost=17.31..17.31 rows=23 width=4)
         ->  Hash Join  (cost=12.52..17.31 rows=23 width=4)
               Hash Cond: (x.indrelid = c.oid)
               ->  Seq Scan on pg_index x  (cost=0.00..4.13 rows=113 width=8)
               ->  Hash  (cost=11.76..11.76 rows=61 width=8)
                     ->  Seq Scan on pg_class c  (cost=0.00..11.76 rows=61 width=8)
                           Filter: (relkind = ANY ('{r,m}'::"char"[]))

Now this is more of a curiosity than a bug, but still illustrating how debugging with the help of SQLsmith might look like.

Large-scale testing

--log-to allows logging of hundreds of sqlsmith instances into a central PostgreSQL database. ./log.sql contains the schema sqlsmith expects and some additional views to generate reports on the logged contents.

It also contains a trigger to filter boring/known errors based on the contents of the tables known and known_re. I periodically COPY my filter tables for testing PostgreSQL into the files ./known_re.txt and ./known.txt to serve as a starting point.

Resources

License

SQLsmith is available under GPLv3. Use it at your own risk. It may damage your database (one of the purposes of this tool is to try and break things). See the file COPYING for details.

Authors

Andreas Seltenreich <seltenreich@gmx.de>

Bo Tang <tangloner@gmail.com>

Sjoerd Mullender <sjoerd@acm.org>

ast.png