/tpls_matlab

Transdimensional Predictive Least Squares for Online Feature Selection

Primary LanguageMATLAB

Transdimensional Predictive Least Squares (TPLS)

Note: Python implementation at tpls_python.

This code is a MATLAB implementation of the algorithm TPLS proposed in our paper "Transdimensional Model Learning with Online Feature Selection based on Predictive Least Squares". We provide an example code on how a user can run TPLS, as well as scripts to reproduce the results and figures in the paper.

Introduction

TPLS is a distribution-free online feature selection algorithm that is completely based on least squares (LS). With new data arrivals, TPLS recursively updates the parameter estimate not only in value, but in dimension as well. What makes TPLS unique is the ability to recursively move up and down model dimension, and the fact that it uses the predictive (instead of the fitting) error as its criterium whether to add or remove features. Specifically, the foundations of TPLS are recursive LS (RLS), order recursive LS (ORLS) and predictive LS (PLS).

Code

How to run TPLS

Required files:
example_code.m
util/
tpls/

Script to run:
example_code.m - a script that demonstrates how to call and run TPLS. It includes feature bar plots and predictive error.

jpls/

ls_updates/ - a folder that contains the least squares update algorithms:
time_update.m - implements RLS (update with arrival of new data point)
ols_ascend.m - implements ascending ORLS (increase model dimension by 1 --> add 1 feature)
ols_descend.m - implements descending ORLS (decrease model dimension by 1 --> remove 1 feature)

jumps_up.m and jumps_down.m - functions which find the predictive error for adding/removing each feature to/from the current model.
pred_error.m - a function that computes the predictive error recursively in dimension.
expectations.m - a function that computes the MSE at single time instants for regret analysis.

Reproduce Results

Scripts to run:
reproduce_fig1_fig4_fig5.m
reproduce_fig2_fig3.m

These scripts run TPLS against the baseline methods OLinLASSO and rjMCMC, as well as the ground truth. The results in figs 1,4,5 are single runs, where as those in figs 2 and 3 are statistical, averaged over R=100 runs. <br>