Pruned Search

Pruned Search is an enhanced searching algorithm for general optimization. It combines the concept of importance ranking and dimension reduction and generalize them into meta-heuristics, in order to help better search for optimal objectives.

As any optimization tool, what the algorithm requires as inputs are a set of control variables x, an objective function in the form of y=f(x) where y is the target of optimization, and a set of constraints as bounds and/or functions of x.

Pruned Search performs four steps of work:

First it takes the objective function and variable constraints to generate a set of data, composed of "good" samples and "bad" samples. "Good" samples are those having high objective values if the target is to optimize, and vice versa. "Bad" samples are values of the other extreme, included to form a opposing force for data mining algorithms. Samples include x values and y value (later simplified into 0 and 1) to denote "bad" and "good" label of samples.
Secondly it takes the data samples generated by (1), performs feature ranking to determine a best search order of variables, from the most important to the least.
In parallel it also takes the data samples generated by (1) to build a classification tree to classify good and bad examples. After a classification tree is built fully, it parses the nodes to extract paths that contribute to the desired target. Variable regions can be narrowed by taking consideration of the threshold values used along the path.
With a more superior variable order to perform the search, and reduced regions for each variable, a regular pattern search algorithm can be seen enhanced in both efficiency and accuracy.

Related Publications

Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit Agrawal, Veera Sundararaghavan, and Alok Choudhary. A Predictive Machine Learning Approach for Microstructure Optimization and Materials Design. Nature Scientific Reports, 5, 11551; doi: 10.1038/srep11551. 2015.
Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Search Space Preprocessing in Solving Complex Optimization Problems. In the IEEE International Conference on Big Data, pp. 1-5, Washington, DC, 2014.
Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Zhengzhang Chen, and Alok Choudhary. Pruned search: A machine learning based meta-heuristic approach for constrained continuous optimization. In the Eighth International Conference on Contemporary Computing (IC3), pp. 13-18, Nodia, India,2015.

Implementation

All scripts are written in MatLab and Python.

The four steps of work are implemented separately in three program files, as follows:

Step 1, data generation. Programs are included in source/datagen/
- Script names starting with "exp_" are those used to generate data, with choices of different randomization methods, and constraint functions.
- For example, exp_randEvery5Gen_E.m generates data using a method "Random Every Five" (described in [1]) with the objective and constraints given by materials property E.
- In it, objective function SeparateOptE and the corresponding constraints (hardcoded) are used.
- Similarly, exp_randGreedyGen* implements the method "Random Greedy", exp_randIntervalGen* implements "Random Interval", exp_randSmallPartition* implements "Random Small Partition" [1]
- The output of this step is the generated data. At the end of each exp_* script specifies where the data is saved, usually as .mat file at data/
Step 2 and Step 3, meta-heuristics generation. Program to perform this step is source/train_model.py
- The function feature_selection performs said Step 2 the importance ranking of features.
- The function calc_feature_ranges performs said Step 3 the search range reduction.
- The output of this program is saved in a .mat file, usually at model_output/, that contains two variables
  - 'sorted_feature_ids': an array of feature ids from the most important to the least important. For example, [3, 5, 4, 2, ...] would mean that feature 3 is the most important feature (to be searched first)
  - 'feature_ranges': a 2D array of feature ranges to search from. For example, [[0.1,0.3], [0.25, 0.36], ...] means that the first feature is best to be searched within a value range between 0.1 and 0.3
Step 4, enhanced optimization. Program to perform this step is source/patternSearch_min.m and source/trustRegion_min.m
- The two scripts each contain a function to be called with a objective function handle.
- For example, in a Matlab shell, call patternSearch_min(@SeparateOptY)
- The output of this program is the best value found, along with certain intermediate printouts to indicate running progress.

Requirements

MatLab 7.6 (or higher)
Python 2.7
Numpy 1.4 (or higher)
Scipy 0.7 (or higher)
Sklearn 0.14 (or higher to support more feature selection methods)

How to run it

Three corresponding demo scripts are included in demo/

Step 1: directly run demo/exp_rand_demo.m will generate data and save in data/data_demo.mat

First, specify your local matlab location, like this

MATLAB=/usr/local/MATLAB/R2012a/bin/glnxa64/MATLAB

Run the Matlab demo code for Step 1, like this

$MATLAB -r "run source/datagen/exp_rand_demo"

Sample output:

Program Step 1: Random data generation.
Property function used:
    @SeparateOptE

Specified size of data to be generated:
    1000

Data of size 600 (rows) x 77 (column) are generated, and saved at ../../data/data_demo.mat

Step 2 and Step 3: run source/train_model.py and specify --input_data as data generated from 1. It will generate variable order and reduced range and save in model_output/modelout.mat

Run like this:

python source/train_model.py --input_data data/data_demo.mat --output_data model_output/demo_modelout.mat

Sample output:

Feature selection   cost 0.0032 seconds
Fitting the decision tree and compute feature range   cost 0.1450 seconds

Step 4: run source/patternSearch_demo.m . It will take data generated from 2 and printout the best value found at the end of program.

Run like this:

$MATLAB -r "run source/patternSearch_demo"

Sample output (This optimization search could take a while):

#### Current Varialbe --------------- 34 
#### Current Varialbe --------------- 57
#### Current Varialbe --------------- 49
#### Current Varialbe --------------- 69
...
#### Program ends. Best value found: -3.430972

Examples of output printouts by directly running the three demo code are included in file demo_screen_1.log, demo_screen_2.log, and demo_screen_3.log

A bash script incorporating all above steps is written, namely run_demo.sh To run all three steps at once, run like this: ./run_demo.sh

Auxiliary scripts

SeparateOptY.m: an example of objective function to minimize.

Acknowledgement

This work is supported by AFOSR (Air Force Office of Scientific Research), Department of Defense (DOD) under Award No. FA9550-12-1-0458

Contact

Rosanne Liu (mimosavvy@gmail.com)
Ankit Agrawal (ankitag@eecs.northwestern.edu)
Alok Choudhary (choudhar@eecs.northwestern.edu)