/BarsNWrapper

Python wrapper, for Wallstrom et al.'s implementation of Bayesian adaptive (free-knot) regression splines

Primary LanguageC

This is a python wrapper for BarsN. The original paper can be found at

Wallstrom, Garrick, Jeffrey Liebner, and Robert E. Kass. "An
implementation of Bayesian adaptive regression splines (BARS) in C
with S and R wrappers." Journal of Statistical Software 26.1 (2008):
1.

To use the software, follow the instructions in INSTRUCTIONS, to
compile a barsN executable. This is usually just a case of

    >>> make barsN.out

Put the python wrapper (wrapper.py) in the current working directory.
It will then provide an interface for running the executable. Note
that the wrapper hard-codes where the executable is to be found. When
the executable is placed elsewhere relative to the wrapper, this path
must changed, or else the code will not work. Changing the executable
path is done easily by modifying line 8 of wrapper.py

The wrapper is used by calling

wrapper.barsN(xs, ys, prior_param, iknots, burnin, sims, tau, c)

    xs: iterable of scalar float-likes. x values of training
	datapoints.

    xs: iterable of scalar float-likes. y values of training
	datapoints.

    iknots: int > 0, initial number of knots to use.

    priorparam: float>0, tuple, or list of tuples
        The parameter for the prior belief on number of knots. The
        prior type is automatically inferred, and is is one of

            uniform : uniform distribution over the number of knots;
            poisson : poisson distribution over the number of knots;
            user : custom distribution over the number of knots.

        The format of the priorparam is used to automatically
        determine which of these priors to use:

            a) if using Poisson, prior_param is a float lambda
               (poisson mean);
            b) if using Uniform, prior_param is a tuple of length 2,
                which gives the minimum number of knots, followed by
                the maximum number of knots. Lower and upper limits
                work the same way as in range(lower,upper).
            c) if using user-defined prior, prior_param is a list of
                tuples. The first entry should be the number of knots,
                and the second column should be the probability of
                obtaining this number of knots. Note the following
                example:

                   [
                    (2, 0.05),
                    (3, 0.15),
                    (4, 0.30),
                    (5, 0.30),
                    (6, 0.10),
                    (7, 0.10),
                   ]

        Default behaviour is a uniform prior with (0,60) knots.

    burnin : int > 0
        The desired length of the burn-in for the MCMC chain

    sims : int > 0
        The number of simulations desired for the MCMC chain

    tau : float > 0
        Parameter that controls the spread for the knot proposal
        distribution

    c : float > 0
        Parameter that controls the probability of birth and death
        candidates

A ModelSet object is returned. To estimate the posterior p(y|x),
simply call the returned ModelSet on the test data, eg.

    model = barsN(data_x, data_y, priorparam=(20,30))
    ys = model(xs)
    	  

The original BarsN implementation is redistributed here, as per the
GNU GPL v2. Original README below.

--------------------------------------------------------------------------


                        BarsN 1.0. 

     Copyright (C) 2003 Garrick Wallstrom, Charles Kooperberg and
     Robert E. Kass.


 This program is free software; you can distribute it and/or modify it
 under the terms of the GNU General Public License as published by the
 Free Software Foundation; either version 2, or (at your option) any
 later version.
  
 These functions are distributed in the hope that they will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.

 The text of the GNU General Public License, version 2, is available
 as http://www.gnu.org/copyleft or by writing to the Free Software
 Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.



 Credits and Acknowledgements:

   ** The general method of BARS is described in
       DiMatteo, I., Genovese, C.R. and Kass, R.E. (2001). ``Bayesian
       curve fitting with free-kont splines,'' Biometrika, 88, 1055-1073.


   ** BarsN uses Hansen and Kooperberg's Logspline as the default
   method for selecting initial knots. Stand-alone software for
   Logspline is available from statlib: http://lib.stat.cmu.edu 
   For details see
       Hansen, M.H. and Kooperberg, C. (2002), ``Spline adaptation in
   extended linear models (with discussion),'' Statistical Science,
   17, 2 - 51.
       
       Stone, C.J., Hansen, M., Kooperberg, C., and Truong,
   Y.K. (1997), ``The use of polynomial splines and their tensor
   products in extended linear modeling (with discussion),'' Annals of
   Statistics, 25, 1371-1470.
    

   ** BarsN uses Bates and Venables Routines for manipulating B-splines, (C)
   1998. The routines were released under the same GNU General Public
   License referred to above.