This project is a training tool for machine learning and deep learning. Based on sklearn and pytorch, the training tool not only provides regular training, cross-validation training, but also provides Bayesian search parameters, and can automatically save training models and logs at any time.

目录 Table of Contents

项目结构 Project Structure

项目文件 Project Files

├─ optUtils (工具目录 Tools directory)
 ├─ __init__.py (读写文件模块 Reading and writing files module)
 ├─ dataUtil.py (数据模块 Data module)
 ├─ logUtil.py (日志模块 Log module)
 ├─ metricsUtil.py (评价指标模块 Evaluation metrics module)
 ├─ modelUtil.py (模型模块 Model module)
 ├─ pytorchModel.py (深度学习模型 Deep learning model)
 ├─ trainUtil.py (训练模块 Training module)
├─ param.yaml (配置文件 Configuration file)
├─ requirements.txt (环境依赖 Dependency package)

附加文件 Additional Files

├─ example_dl_model.py (深度学习模型样例 Deep learning model examples)
├─ example_train.py (常规训练样例 Regular training examples)
├─ example_train_cv.py (交叉验证训练样例 Cross-validation training examples)
├─ example_train_bys.py (贝叶斯搜索样例 Bayesian search examples)

使用方法 Getting Started

安装方法 Installation

First, pull the project to the local.

$ git clone git@github.com:lyx199504/param-opt.git

Next, enter the project and install the dependencies of the project. However, it should be noted that pytorch may need to be installed in other ways. After installing pytorch, you can directly install other dependencies with the following code.

$ cd param-opt/
$ pip install -r requirements.txt

Finally, executing any training sample file will generate a log folder "log" and a training model folder "model".

训练步骤 Training Steps

Training with this tool requires preliminary machine learning or deep learning experience. The following tutorial can only briefly mention the basic steps of training, if you want to use and modify the tool in more depth, you need to read the code in the optUtils folder further.

常规训练 Regular Training

Refer to example_train.py.

Machine learning regular training:
Step 1: Encapsulate the data to be trained as numpy.ndarray type, and use the layered data shuffling function in dataUtil.py to shuffle the data, and divide the data into training sets and test sets;
Step 2: Fill in the segmented data, model name, model parameters and evaluation metrics list into the ml_train function in trainUtil.py to train the model.

The model name can be viewed in the __model_dict dictionary in modelUtil.py, and the evaluation metrics can be directly used in sklearn, or in metricsUtil.py. If the model to be used is not in the dictionary, you can build the model yourself and fill in the model parameter in the ml_train function; if you need to create a new evaluation metric, you can add it yourself.

Deep learning regular training:
Step 1: The same as the regular training of machine learning;
Step 2: Find the model in pytorchModel.py or construct your own model (refer to example_dl_model.py), and then fill in the hyperparameters, data and evaluation indicators to train the model. Some features that are worth mentioning are as follows:

model.param_search = True 
# 参数搜索开关,默认开启,不使用参数搜索时需要关闭 
# Parameter search switch, enabled by default, needs to be disabled during regular training.
model.save_model = False
# 保存训练模型的开关,开启则会保存每个epoch的训练模型和日志,默认关闭
# The switch to save the training model, if turned on, the training model and log of each epoch will be saved, and it is turned off by default.
model.only_save_last_epoch = False
# 保存最后epoch的开关,开启则仅保存最后一个epoch的训练模型和日志,默认关闭
# The switch to save the last epoch, if enabled, only the training model and log of the last epoch will be saved, and it is disabled by default.
model.device = 'cuda:0'
# device设置为“cuda:0”,则启用第0个GPU训练模型,不设置则默认采用CPU训练
# If device is set to "cuda:0", the 0th GPU training model will be enabled. If not set, the CPU training will be used by default.

交叉验证训练 Cross-validation Training

Refer to example_train_cv.py.

Cross-validation training does not differentiate between machine learning and deep learning training, in other words, the same process can be used for both kinds of training.

The cross-validation training steps are similar to regular training, except that the ml_train function is replaced by the cv_train function. When the model to be used is not in modelUtil.py, the model created by yourself should be registered according to the following code:


where rnn_clf is the model_name of the model and RNNClassifier is the class name constructed by the model.


  fold: 10  # 训练的折数 training fold
  workers: 1  # 进程数,即采用多少进程并发执行 The number of processes, that is, how many processes are used to execute concurrently

贝叶斯搜索训练 Bayesian Search Training

Refer to example_train_bys.py.

The pre-training steps of Bayesian search are the same as cross-validation training, and you also need to register your own new model (if there is a new model).

During training, replace the cv_train function with the bayes_search_train function, and the data does not need to be split into training set and validation set, because the training will be automatically split during training.


  n_iter: 10  # 迭代次数,即采用多少个参数组合训练 The number of iterations, that is, how many parameter combinations are used for training
  fold: 3  
  workers: 1 
  - [lr_clf, {
      max_iter: !!python/tuple [50, 200],
      C: !!python/tuple [0.8, 1.2, 'uniform'],
      random_state: !!python/tuple [1, 500],

Among them, each model is defined under "model", "lr_clf" is the model name, the left side of the colon ":" is the parameter name, and the right side is the range of parameters to be searched. If you want to use Bayesian search more proficiently, you may need to further Learn about BayesSearchCV in the skopt package.

项目声明 Project Statement

The author and affiliation of this project:

项目名称(Project Name):param-opt
项目作者(Author):Yixiang Lu, Shijie Xu, Chengxi Jiang, Xinquan Yang, Shihan Chen, Guanggang Geng, Dongjie Liu
作者单位(Affiliation):暨南大学网络空间安全学院(College of Cyber Security, Jinan University)

If you use this project for the experiment of the paper, you can cite this project, the latex version is cited as follows:

  author       = {Lu, Yixiang},
  title        = {param-opt: A machine learning training tool},
  year         = {2022},
  howpublished = {\url{https://github.com/lyx199504/param-opt}}

The word version is quoted as follows:

Y. Lu, param-opt: A machine learning training tool, https://github.com/lyx199504/param-opt (2022).

When you disclose the code based on this project, you must indicate the original project author and source:

Author: Yixiang Lu
Project: [param-opt](https://github.com/lyx199504/param-opt)
许可证 License

MIT © 2022 Yixiang Lu - 夜光