|-- SLSqliNet
|-- dataset
| |-- test
| |-- train
| |-- val
| |-- word_embedding
|-- detection
| |-- bilstm
| |-- bilstm_attention
| |-- double_bilstm
| |-- double_bilstm_attention
| |-- double_lstm
| |-- lstm
| |-- svm
| |-- utils
|-- src
|-- config
|-- generalize
| |-- colx
|-- middleware
|-- utils
|-- word_embedding
- dataset
- word_embedding: pretrained sentence vectors by using Albert model
- train/test/val: train/test/validation set of pretrained dataset, proportion: 6:3:1
- detection dir
- utiles: split raw dataset to train/test/validation set, load dataset in model training period and evaluate the model performance
- other folders: different model training and evaluation result
- src dir
- middleware: middleware for listening and forwarding communication traffic between Web application and DB
- generalize: parse and generalize SQL statements based on generalizing rule and parser
- colx: MySQL parser based on Golang
- word_embedding: pretrain by using Alber model and generate sentence vectors
- Run Web Server
# src/
python3 server.py
Request web application and perform SQL injection: call request()
of src/request_api.py
Perform normal request to Web API: call request_normal_api()
of src/request_api.py
- Run Middleware
Run middleware to listen and forward communication traffic between Web app and DB:
# src/middleware
python3 middleware.py
src/generalize/colx/main.go
implements two structures named Visitor to extract column/table name of MySQL statements. colx
and colx.exe
are executive file for Linux and Windows respectively.
Parse and generalize MySQL statements:
# src/generalize/
python3 generalize.py
# src/word_embedding/
python3 init.py
init.py
calls Pretrain class of albert_embedding.py
to generate sentence vector.
detection/utils/evaluate.py
: evaluate model performance.
detection/utils/data.py
: split raw dataset into train/test/validation set with 6:3:1 proportion.
detection/utils/config.py
: configure dataset path and other parameters.
Every folders in detection/
except utils
contains model file and source code. You can run .py
file to train and evaluate model, and output accuracy/loss figure, JSON file of evaluation result, JSON file of training history, and model folder.