/go-randgen

a QA tool to random generate sql by bnf pattern

Primary LanguageGo

go randgen

中文文档

go version of mysql rangen for db fuzz test

Install with go get Command

go get -u  github.com/pingcap/go-randgen/cmd/go-randgen

try it:

go-randgen -h

How to Compile It

  • install go-bindata command
go get -u github.com/jteeuwen/go-bindata/...
  • compile go-randgen
make all

Features

  1. It has a built-in zz file, you can generate sqls just with a yy file
  2. Be able to generate sql without connection to db, so it is very fast if you just want sqls
  3. yy syntax is mysql randgen compatible, as long as it doesn't contain embedded perl code
  4. go-randgen supports embedded lua code blocks, just as mysql randgen supports perl code blocks
  5. Implemented with pure Golang, it's design is very flexible, easy to hack
  6. All exported functions are stateless other than cmd package, if needed, you can call them as a library

Quick Start

gentest

generate sqls to test window functions:

# -Y the path of yy file to use
# -Q sql num to generate by yy
# -B break ddls and generated sqls into two files
# if user doesn't specfied a zz file path, it will use built-in zz
./go-randgen gentest -Y examples/windows.yy -Q 10 -B

After execution, you will see a output.data.sql, which includes ddl (define table structrue) and dml (init data in tables), and a output.rand.sql, which includes sqls generated by specified yy file.

If you don't want to use built-in zz, you can write a zz by hand, and use -Z to specify path, the detail of zz syntax to see following Grammar Guide.

If you just want to generate some sqls by yy, without generating ddl, you can use --skip-zz to skip ddl generation, but yy file can not include key word related with table name or field name in this condition.

the detail of yy syntax to see following Grammar Guide.

gendata

generate table structrue and data in user specified dsns according to user specified or built-in zz file.

# generate table structure and data specified by zz in specified dsn
# dsns are split by comma
./go-randgen gendata --dsns "root:@tcp(127.0.0.1:3306)/randgen,root:@tcp(127.0.0.1:4000)/randgen"

gensql

parse yy to generate sqls by user specified dsn:

./go-randgen gensql -Y examples/functions.yy \  
             --dsn "root:@tcp(127.0.0.1:3306)/randgen" \ 
             -Q 100

Note that gensql assumes structures of tables in the dsn are the same (Because data genrated by randgen has this trait).

exec

User sepecifies two dsns, then exec command will execute generated sqls in two dsn and dump sqls with inconsistent execution result.

Example:

./go-randgen exec -Y examples/functions.yy \
             --dsn1 "root:@tcp(127.0.0.1:4000)/randgen" \
             --dsn2 "root:@tcp(127.0.0.1:3306)/randgen" \  
             -Q 100

In above command, go randgen will generate data in two dsns first, then execute 100 sqls generated according to functions.yy, if execution result of a sql is inconsistent, it will dump related info into dump directory. (you can change this directory by --dump)

Note that go randgen compare sql result without order in default. For example, go-randgen think that the following result is consistent:

Result1:

+------+------+
| p    | s    |
+------+------+
|    1 | aaa  |
|    2 | bbb  |
+------+------+

Result2:

+------+------+
| p    | s    |
+------+------+
|    2 | bbb  |
|    1 | aaa  |
+------+------+

If you want to compare in order byte by byte, you should add --order.

exec can also skip data generation by set --skip-zz, it will generate sqls just like gensql command.

Call It as a Library

Except packages under cmd directory, all exported functions is stateless, you can call them again and again safely.

The implementation under cmd directory is the best examples to use it as a library.

A simple example: get an iterator by yy and generate ten sqls

package main

import (
	"fmt"
	"github.com/pingcap/go-randgen/grammar"
	"github.com/pingcap/go-randgen/grammar/sql_generator"
	"log"
)

func main() {
	yy := `
{
i = 1
}

query: 
    create
    
create:
    CREATE TABLE 
    {print(string.format("table%d", i)); i = i+1}
    (a int)
`
	iterator, err := grammar.NewIter(yy, "query", 5, nil, false)
	if err != nil {
		log.Fatalf("get iter err %v\n", err)
	}

	iterator.Visit(sql_generator.FixedTimesVisitor(func(_ int, sql string) {
		fmt.Println(sql)
	}, 5))
}

Note that there is no problem to pass nil in the third param because of the absence of keyword in the example yy. But if you use yy keyword, you should use gendata.NewKeyfun() to create this param.

Print result:

CREATE TABLE table1 (a int)
CREATE TABLE table2 (a int)
CREATE TABLE table3 (a int)
CREATE TABLE table4 (a int)
CREATE TABLE table5 (a int)

Grammar Guide

zz Grammar

Quick Start

zz file is a lua script file, it will define the following three things:

  1. tables to generate
  2. fields in tables
  3. data in fields

Use built-in zz as an example:

-- table related defination
tables = {
    -- record number
    rows = {10, 20, 30, 90},
    -- table character set
    charsets = {'utf8', 'latin1', 'binary'},
    -- table partition number, 'undef' represents no partition
    partitions = {4, 6, 'undef'},
}

-- field related defination
fields = {
    -- field data types
    types = {'bigint', 'float', 'double', 'decimal(40, 20)',
        'char(20)', 'varchar(20)'},
    -- generate signed and unsigned field with above type 
    sign = {'signed', 'unsigned'}
}

-- data init related defination
data = {
    -- optional value of number type
    numbers = {'null', 'tinyint', 'smallint',
        '12.991', '1.009', '-9.183',
        'decimal',
    },
    -- optional value of char type
    strings = {'null', 'letter', 'english'},
}

As you can see, there must be three lua Table variables in zz file, which is tables, fields and data.

tables defines attributes on table, like rows, charsets and partitions in above example, these attributes' cartesian product will be evaluated,
every tuple in which will generate a table. tables definition in above example will generate 4(rows)*3(charsets)*3(partitions)=36 tables in total.

fields defines what fields in tables, these infos' cartesian product also will be evaluated, every tuple in which will generate a field, but the fields generated by above example is less than 6(types)*2(sign)=12, because sign attribute can only apply on numberic type, go-randgen will automatically ignore it for non-numberic types, so example definition will only genrate 4(number)*2(sign)+2(char)=10 fields. Note that tables generated by randgen all have the same fields.

data defines data in tables, key represents field types (see following for detail), value is an array, which are optional values of this type. For each genrated record, it will random choose a value for fields of corresponding type. Optional values include "literal" and "generator".Take the definition of numbers above as an example, null,12.991, etc. are literals, they will be take directly as a value, but when tinyint (it is a generator) is taken, it will generate a random value in -128~127 (to know more generators, see following for detail).

tables

keys mean options default
rows record number in table any positive number [0, 1, 2, 10, 100]
charsets table's character set 'utf8','utf8mb4','ascii','latin1','binary', 'undef' means not set charset explicitly ['undef']
partitions partition number of table any positive number or 'undef', 'undef' means no partition ['undef']

Related source code is tablesVars variable in gendata/tables.go

fields

keys mean options default
types field type any valid MySQL type ['int', 'varchar', 'date', 'time', 'datetime']
keys index or not 'key' means add index to the field, 'undef' means not ['undef', 'key']
sign unsigned or not 'signed', 'unsigned' ['signed']

Related source code is fieldVars variable in gendata/fields.go.

data

data definition in go-randgen is enhanced compared with mysql randgen. mysql randgen only support five summary types, which is numbers, blobs, temporals, enum and strings. go-randgen not only support the five summary types, but also support fine-grained definition.

For example:

data = {
    numbers = {'null', 'tinyint', 'smallint',
        '12.991', '1.009', '-9.183',
        'decimal',
    },
    bigint = {100, 10, 3},
}

For above data definition, it will take randomly from 100, 10 and 3 when go-randgen generate bigint field, instead from numbers definition, which is more rough.

The map from concrete data types to summary types is summaryType variable in gendata/data.go.

Among above numbers definition, 'tinyint', 'smallint', 'decimal' are go-randgen built-in generators.

To know all generators in go-randgen, you can see init function in gendata/generators/register.go

yy Grammar

Quick Start

A simple example:

# one line comment
/*
multiline comment
*/

query:
    select
    | select1

select:
    SELECT fields FROM _table
    
fields:
    _field
    | _field_int

The generated result of this yy maybe:

select1
SELECT "a random field" FROM "a random table in this db"
SELECT "a random int field" FROM "a random table in this db"
select1

Comment

  • one line comment: #
  • multiline comment: /**/

Classifications of Identifier

  • Non-Terminal: be composed of lowercase, number of underline, but can not begin with number
  • Terminal: be composed of uppercase, special character or number, but can not begin with underline
  • Keyword: begin with underline

For Non-Terminal in the right of production, if the corresponding production can not be found, the Non-Terminal degrade to Terminal

Keyword

Keyword begin with underline.

The keywords to get table names and field names:

  • _table: randomly get a table name from generated tables
  • _field: randomly get a field name from generated fields
  • _field_int: randomly get a field from generated int fields
  • _field_char: randomly get a field from generated char and varchar fields
  • _field_list: get all field split by comma in the table
  • _field_int_list: get all int field split by comma in the table
  • _field_char_list: get all char field split by comma in the table

Some sugars to randomly generate data(it will generate double quotes automatically in character related data):

  • _digit: random number in 0-9
  • _letter: random char from 'a' to 'z'
  • _english: random English word
  • _int: radom int value
  • _date: generate randomly yyyy-MM-dd formatted date
  • _year: random year
  • _time: random hh:mm:ss formatted time
  • _datetime: random yyyy-MM-dd hh:mm:ss formatted time

I only write a part of keywords. If you want to know all keywords, You can see the NewKeyfun function in gendata/gendata.go

Embedded Lua Code Block

You can write lua code in "{}", call print function to splice content to sql

query:{a = 1}
    CREATE TABLE 
    {print(string.format("t%d", a))} (a INT)

Above example will always generate sql CREATE TABLE t1 (a INT).

You can call any funcation in lua standard library, for example:

# random number from 10-20
query:
    {print(math.random(10,20))}

Normal code block will be executed every time the corresponding branch is selected.

go-randgen support insert code in the head of yy file, this code will only be executed once, called head code block, the main application of head code block is the declaration of variables or functions:

# head code block
{
i = 1
a = 100
function add(num1, num2)
    return num1 + num2    
end
}

query:
   select

select:
   SELECT * FROM _table WHERE where_clause
   
where_clause:
   _field_int > {print(i)}
   | _field_char > {print(a)}
   | _field_int + _field_int > {print(add(i, a))}
   

You may be afraid that the brace may be conflict with lua Table definition.In fact, you are worried for no thing, you are free to use lua Table in code block:

{
f={a=1, b=3}
arr={0,2,3,4}
}

query:
  {print(arr[f.a])} | {print(arr[f.b])}

Above example will only genrate "0" or "3" (Note that index of lua array begin from 1).

Another important feature is that you can call yy key word in lua code block by _xxx(), for example:

query:{table = _table()}
    BEGIN ; update ; select ; END

update:
    UPDATE {print(table)} SET _field_int = 10

select:
    SELECT * FROM {print(table)}

Above yy will ensure that random table in update and select is the same.

Frequent Pattern

  • recursive subquery
query:
    select

select:
    SELECT * FROM
    (select)
    WHERE _field_int > 10
    | SELECT * FROM _table WHERE _field_char = _english
  • empty rule
 order:
     ASC
     |DESC
     |    # order can product empty
     
 #...
  • generate a sequence of sqls

For exampe, we want to test Prepared statement

query:
	SET @stmt = {print('"')} select {print('"')};
	PREPARE stmt FROM @stmt_create ; 
	EXECUTE stmt ;
	
select:
    SELECT * FROM _table

If you generate 3 sqls with above yy(namely set -Q with 3), it will generate sqls as follow:

SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create; 
EXECUTE stmt;

If you generate 6 sqls, it will generate above sqls twice.

If you generate 2 sqls, result is as follow:

SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create; 

From above examples, we can see the mean of ;. The symbol extends from mysql randgen, represents generation of a sequence of sqls.

  • create unique table when test create statements

Program 1: insert lua script, apply head code block

# declare i to 1
{
i = 1
}

query: 
    create
    
create:
    CREATE TABLE 
    {print(string.format("table%d", i)); i = i+1}
    (a int)

Program 2: create table first, then drop it in next sql

query:
    create
    
create:
    CREATE TABLE t (a int); DROP TABLE t

Result:

CREATE TABLE t (a int);
DROP TABLE t;
CREATE TABLE t (a int);
DROP TABLE t;
...

How to Hack

Compared to mysql randgen, go-randgen is very easy to hack. If you want some feature it doesn't have yet, you can add it conveniently.

Hack zz data

If you think all built-in generators do not meet your demand, you can register new generator in init function in
gendata/generators/register.go

Assume that you add a aaa generator, you will not only can use "aaa" in zz data field, but alse can use _aaa key word in yy.

Hack yy Keyword

You can add your keyword in NewKeyfun function in gendata/gendata.go

Differences with mysql randgen

  • do not need to add ; at the end of bnf expression, which is the habit of mysql randgen. go-randgen do not depend on ; to recognize different bnf expression. Of course, there is no problem if you add ; at the end of bnf expression, because we are compatible with mysql randgen
  • go-randgen init zz data with insert statement, other than insert ignore. So if it meets unsigned field, it will try 10 times at most to generate positive value, if it fails in 10 times, 1 will be assigned to this field
  • data definition in zz can be more precise, other than the only four types in mysql randgen
  • support generate sqls without connection to db, which takes advantage of schema cache in ddls generation, it is very fast.