Laozhongyi is a simple but effective automatic hyper-parameter tuning program based on grid search and simulated annealing.
- There are lots of boring things in hyperparameter tuning, e.g., configuring a bunch of values for hyperparameters, naming a bunch of log files, launching a bunch of processes, comparing results among log files, etc. So manual operation is both time consuming and error prone.
- Some existed automatic hyperparameter tuning programs rely on Python, while Laozhongyi only relies on string parsing. Thus you can use laozhongyi with any programming language and any deep learning library.
We will illustrate how to use Laozhongyi in the following.
lr,0.01,0.001,0.0001
dropout,0,0.1,0.2,0.3,0.4,0.5
batch_size,1,2,4,8,16,32,64
pretrained,/home/xxx/modelxxx
Each line is a hyperparameter name followed by a list of values to be tuned, separated by commas. In particular, if the parameter has only one value, it means a fixed value and will not be tuned.
- Your program should exit when it performs sufficiently well (e.g., the F1 value on the validation set has not improved for ten consecutive epochs). Otherwise, Laozhongyi will abort your program after it reaches the upper elapsed time limit, which will harm tuning efficiency.
- Your program should output the log to standard output (Laozhongyi will redirect the standard output to a corresponding log file). The log should contain strings such as laozhongyi_0.8, where 0.8 means the best performance on the validation set.
- Your program should parse the hyperparameter config file generated by Laozhongyi and take the path of the hyperparameter config file as the command line argument.
An example of the generated hyperparameter config file is as follows:
lr = 0.001
dropout = 0.1
batch_size = 64
pretrained = /home/xxx/modelxxx
This project is built with Java 8 and Maven, so you can run mvn clean package to generate the target folder, or download it from releases. The command line arguments of Laozhongyi are listed as follows:
usage: laozhonghi
-c <arg> cmd
-rt <arg> program runtime upper bound in minutes
-s <arg> scope file path
-sar <arg> simulated annealing ratio
-sat <arg> simulated annealing initial temperature
-strategy <arg> base or sa
-wd <arg> working directory
- -c means your program's command line which should be quoted, e.g., "python3 train.py -train train.txt -dev dev.txt -test test.txt -hyper {}",where {} will be replaced with the hyperparameter config file's path by Laozhongyi.
- -rt means the upper limit of the running time of each process in minutes. For example, -rt 20 means that if a process does not exit after 20 minutes, it will be aborted.
- -s means the path to the config file for the hyperparameter tuning range.
- -strategy means the search strategy, with base referring to the coordinate descent method, and sa referring to the simulated annealing method.
- -sar means the temperature's decay rate.
- -sat means the initial temperature.
A complete example is as follows:
cd target
java -cp "*:lib/*" com.hljunlp.laozhongyi.Laozhongyi -s /home/wqs/laozhongyi.config\
-c "python3 train.py -train train.txt -dev dev.txt -test test.txt -hyper {}"\
-sar 0.9 -sat 1 -strategy sa -rt 5
We recommend you to use screen to run Laozhongyi.
When Laozhongyi starts, it will generate the log directory with the timestamp suffix and the hyperparameter config directory in the home directory.
- Laozhongyi supports multi-process hyperparameter-tuning, currently up to 8 processes.
- Your program should exit when performance is unlikely to improve any further, or it will be killed when reaching the elapsed time limit.
The coordinate descent method tries all the hyperparameters in a loop. For each hyperparameter, Laozhongyi tries all of its values and selects the one that performs best. The algorithm stops until the selected values of all the hyperparameters are no longer changed.
To alleviate the problem that the coordinate descent method is easy to converge to the local optimal solution, we introduce the strategy of simulated annealing.
Email: chncwang@gmail.com