/rgf_python

Python Wrapper of Regularized Greedy Forest.

Primary LanguagePython

rgf_python

Machine learning Regularized Greedy Forest wrapper for Python.

Feature

Scikit learn like interface and Multi classification problem is OK.

Original rgf implement is only available for regression and binary classification, but rgf_sklearn is also available for Multi classification by "One-or-Rest" method.

ex.

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.cross_validation import StratifiedKFold
from rgf.lib.rgf import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

rgf = RGFClassifier(max_leaf=400,
                    algorithm="RGF_Sib",
                    test_interval=100,)

# cross validation
rgf_score = 0
n_folds = 3

for train_idx, test_idx in StratifiedKFold(iris.target, n_folds):
    xs_train = iris.data[train_idx]
    y_train = iris.target[train_idx]
    xs_test = iris.data[test_idx]
    y_test = iris.target[test_idx]

    rgf.fit(xs_train, y_train)
    rgf_score += rgf.score(xs_test, y_test)

rgf_score /= n_folds
print('score: {0}'.format(rgf_score))

Software Requirement

Installation

git clone https://github.com/fukatani/rgf_python.git
python setup.py install

And you need to edit rgf/lib/rgf.py

## Edit this ##################################################

#Location of the RGF executable
loc_exec = 'C:\\Users\\rf\\Documents\\python\\rgf1.2\\bin\\rgf.exe'
loc_temp = 'temp/'

## End Edit

You need to direct appropriate location of rgf exec file to 'loc_exe'. 'loc_temp' is directory for placing temp file.

##Tuning hyper parameter You can tuning parameter as follows.

max_leaf: Appropriate values are data-dependent and in varied from 1000 to 10000.

test_interval: For efficiency, it must be either multiple or divisor of 100 (default of the optimization interval).

algorithm: You can select "RGF", "RGF Opt" or "RGF Sib"

loss: "LS", "Log" or "Expo".

reg_depth: Must be no smaller than 1. Meant for being used with algorithm = "RGF Opt" or "RGF Sib". 

l2: Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss=Expo) and logistic loss (loss=Log) some data requires smaller values such as 1e-10 or 1e-20 Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss=Expo) and logistic loss (loss=Log) some data requires smaller values such as 1e-10 or 1e-20

sl2: Default is equal to ls. On some data, λ/100 works well.

Detail of tuning parameter is here. http://stat.rutgers.edu/home/tzhang/software/rgf/rgf1.2-guide.pdf

Other

Shamelessly, many part of the implementation is based on the following. Thanks! https://github.com/MLWave/RGF-sklearn