/skflow

Simplified interface for TensorFlow (mimicking Scikit Learn)

Primary LanguagePythonApache License 2.0Apache-2.0

Travis-CI Build Status License

Scikit Flow

This is a simplified interface for TensorFlow, to get people started on predictive analytics and data mining.

Why TensorFlow?

  • TensorFlow provides a good backbone for building different shapes of machine learning applications.
  • It will continue to evolve both in the distributed direction and as general pipelinining machinery.

Why Scikit Flow?

  • To smooth the transition from the Scikit Learn world of one-liner machine learning into the more open world of building different shapes of ML models. You can start by using fit/predict and slide into TensorFlow APIs as you are getting comfortable.
  • To provide a set of reference models that would be easy to integrate with existing code.

Installation

First, make sure you have TensorFlow and Scikit Learn installed, then just run:

pip install git+git://github.com/google/skflow.git

Tutorial

Usage

Below are few simple examples of the API. For more examples, please see examples.

General tips

  • It's useful to re-scale dataset before passing to estimator to 0 mean and unit standard deviation. Stochastic Gradient Descent doesn't always do the right thing when variable are very different scale.

  • Categorical variables should be managed before passing input to the estimator. I'll write a tutorial in coming days on how to handle categorical variables Deep Learning-style.

Linear Classifier

Simple linear classification:

import skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()
classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
classifier.fit(iris.data, iris.target)
score = metrics.accuracy_score(classifier.predict(iris.data), iris.target)
print("Accuracy: %f" % score)

Linear Regressor

Simple linear regression:

import skflow
from sklearn import datasets, metrics, preprocessing

boston = datasets.load_boston()
X = preprocessing.StandardScaler().fit_transform(boston.data)
regressor = skflow.TensorFlowLinearRegressor()
regressor.fit(X, boston.target)
score = metrics.mean_squared_error(regressor.predict(X), boston.target)
print ("MSE: %f" % score)

Deep Neural Network

Example of 3 layer network with 10, 20 and 10 hidden units respectively:

import skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3)
classifier.fit(iris.data, iris.target)
score = metrics.accuracy_score(classifier.predict(iris.data), iris.target)
print("Accuracy: %f" % score)

Custom model

Example of how to pass a custom model to the TensorFlowEstimator:

import skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()

def my_model(X, y):
    """This is DNN with 10, 20, 10 hidden layers, and dropout of 0.5 probability."""
    layers = skflow.ops.dnn(X, [10, 20, 10], keep_prob=0.5)
    return skflow.models.logistic_regression(layers, y)

classifier = skflow.TensorFlowEstimator(model_fn=my_model, n_classes=3)
classifier.fit(iris.data, iris.target)
score = metrics.accuracy_score(classifier.predict(iris.data), iris.target)
print("Accuracy: %f" % score)

Custom model with multiple GPUs

To use multiple GPUs to build a custom model, everything else is the same as the example above except that in the definition of custom model you'll need to specify the device:

import tensorflow as tf

def my_model(X, y):
    """
    This is DNN with 10, 20, 10 hidden layers, and dropout of 0.5 probability.

    Note: If you want to run this example with multiple GPUs, Cuda Toolkit 7.0 and 
    CUDNN 6.5 V2 from NVIDIA need to be installed beforehand. 
    """
    with tf.device('/gpu:1'):
    	layers = skflow.ops.dnn(X, [10, 20, 10], keep_prob=0.5)
    with tf.device('/gpu:2'):
    	return skflow.models.logistic_regression(layers, y)

Coming soon

  • Easy way to handle categorical variables
  • Text categorization
  • Images (CNNs)
  • More & deeper