Pruned Search is an enhanced searching algorithm for general optimization. It combines the concept of importance ranking and dimension reduction and generalize them into meta-heuristics, in order to help better search for optimal objectives.
As any optimization tool, what the algorithm requires as inputs are a set of
control variables x, an objective function in the form of y=f(x)
where y is
the target of optimization, and a set of constraints as bounds and/or functions
of x.
Pruned Search performs four steps of work:
- First it takes the objective function and variable constraints to generate a set of data, composed of "good" samples and "bad" samples. "Good" samples are those having high objective values if the target is to optimize, and vice versa. "Bad" samples are values of the other extreme, included to form a opposing force for data mining algorithms. Samples include x values and y value (later simplified into 0 and 1) to denote "bad" and "good" label of samples.
- Secondly it takes the data samples generated by (1), performs feature ranking to determine a best search order of variables, from the most important to the least.
- In parallel it also takes the data samples generated by (1) to build a classification tree to classify good and bad examples. After a classification tree is built fully, it parses the nodes to extract paths that contribute to the desired target. Variable regions can be narrowed by taking consideration of the threshold values used along the path.
- With a more superior variable order to perform the search, and reduced regions for each variable, a regular pattern search algorithm can be seen enhanced in both efficiency and accuracy.
- Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit Agrawal, Veera Sundararaghavan, and Alok Choudhary. A Predictive Machine Learning Approach for Microstructure Optimization and Materials Design. Nature Scientific Reports, 5, 11551; doi: 10.1038/srep11551. 2015.
- Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Search Space Preprocessing in Solving Complex Optimization Problems. In the IEEE International Conference on Big Data, pp. 1-5, Washington, DC, 2014.
- Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Zhengzhang Chen, and Alok Choudhary. Pruned search: A machine learning based meta-heuristic approach for constrained continuous optimization. In the Eighth International Conference on Contemporary Computing (IC3), pp. 13-18, Nodia, India,2015.
All scripts are written in MatLab and Python.
The four steps of work are implemented separately in three program files, as follows:
-
Step 1, data generation. Programs are included in source/datagen/
- Script names starting with "exp_" are those used to generate data, with choices of different randomization methods, and constraint functions.
- For example, exp_randEvery5Gen_E.m generates data using a method "Random Every Five" (described in [1]) with the objective and constraints given by materials property E.
- In it, objective function SeparateOptE and the corresponding constraints (hardcoded) are used.
- Similarly, exp_randGreedyGen* implements the method "Random Greedy", exp_randIntervalGen* implements "Random Interval", exp_randSmallPartition* implements "Random Small Partition" [1]
- The output of this step is the generated data. At the end of each exp_* script specifies where the data is saved, usually as .mat file at data/
-
Step 2 and Step 3, meta-heuristics generation. Program to perform this step is source/train_model.py
- The function feature_selection performs said Step 2 the importance ranking of features.
- The function calc_feature_ranges performs said Step 3 the search range reduction.
- The output of this program is saved in a .mat file, usually at
model_output/, that contains two variables
- 'sorted_feature_ids': an array of feature ids from the most important to the least important. For example, [3, 5, 4, 2, ...] would mean that feature 3 is the most important feature (to be searched first)
- 'feature_ranges': a 2D array of feature ranges to search from. For example, [[0.1,0.3], [0.25, 0.36], ...] means that the first feature is best to be searched within a value range between 0.1 and 0.3
-
Step 4, enhanced optimization. Program to perform this step is source/patternSearch_min.m and source/trustRegion_min.m
- The two scripts each contain a function to be called with a objective function handle.
- For example, in a Matlab shell, call patternSearch_min(@SeparateOptY)
- The output of this program is the best value found, along with certain intermediate printouts to indicate running progress.
- MatLab 7.6 (or higher)
- Python 2.7
- Numpy 1.4 (or higher)
- Scipy 0.7 (or higher)
- Sklearn 0.14 (or higher to support more feature selection methods)
Three corresponding demo scripts are included in demo/
- Step 1: directly run demo/exp_rand_demo.m will generate data and save in
data/data_demo.mat
- First, specify your local matlab location, like this
MATLAB=/usr/local/MATLAB/R2012a/bin/glnxa64/MATLAB
- Run the Matlab demo code for Step 1, like this
$MATLAB -r "run source/datagen/exp_rand_demo"
- Sample output:
Program Step 1: Random data generation. Property function used: @SeparateOptE Specified size of data to be generated: 1000 Data of size 600 (rows) x 77 (column) are generated, and saved at ../../data/data_demo.mat
- First, specify your local matlab location, like this
- Step 2 and Step 3: run source/train_model.py and specify --input_data as data
generated from 1. It will generate variable order and reduced range and save
in model_output/modelout.mat
- Run like this:
python source/train_model.py --input_data data/data_demo.mat --output_data model_output/demo_modelout.mat
- Sample output:
Feature selection cost 0.0032 seconds Fitting the decision tree and compute feature range cost 0.1450 seconds
- Run like this:
- Step 4: run source/patternSearch_demo.m . It will take data generated from 2
and printout the best value found at the end of program.
- Run like this:
$MATLAB -r "run source/patternSearch_demo"
- Sample output (This optimization search could take a while):
#### Current Varialbe --------------- 34 #### Current Varialbe --------------- 57 #### Current Varialbe --------------- 49 #### Current Varialbe --------------- 69 ... #### Program ends. Best value found: -3.430972
- Run like this:
Examples of output printouts by directly running the three demo code are included in file demo_screen_1.log, demo_screen_2.log, and demo_screen_3.log
A bash script incorporating all above steps is written, namely run_demo.sh To run all three steps at once, run like this: ./run_demo.sh
- SeparateOptY.m: an example of objective function to minimize.
This work is supported by AFOSR (Air Force Office of Scientific Research), Department of Defense (DOD) under Award No. FA9550-12-1-0458
- Rosanne Liu (mimosavvy@gmail.com)
- Ankit Agrawal (ankitag@eecs.northwestern.edu)
- Alok Choudhary (choudhar@eecs.northwestern.edu)