/ackeras

AutoML library for Accurat, based on AutoKeras and Scikit-Learn.

Primary LanguagePython

ACKERAS

Installation

The library is now pip installable so just go ahead and type

$ pip install ackeras

Note: As autokeras the library is only compatible with: Python 3.6.

Note double down: I refer to the Keras setup at Keras and suggest the Theano backend but you can also use Tensorflow

Disclaimer

It just started so most things do not work properly or need to be fix, there are plenty of #TODO inside, but feel free to use and to pull.

Scope

The idea is to be able to input a file in CSV or JSON format and, after selecting a few parameters (see below), getting your data cleaned and clustered automatically, ready to be analyzed. This can be useful in the context of preliminary analysis and to implement some outputs in visualization (e.g. a clustering in a scatterplot or the probabilities of a certain class with a decision tree etc.).

The implementations are:

  • Data cleaning: NaN filling with various methods, label encoding and one hot encoding, flagging of categorical feautures and dropping redundant feautures (almost);
  • Dimensionality Reduction: PCA and UMAP
  • Clustering: k-means, with silhoutte analysis optimization, and DBSCAN clustering;
  • Logistic and Linear regression, with K-fold cross validation.
  • Random Forests and Support Vector Machines, with genetic algorithm optimization.
  • Outlier detection with Random Forests and
  • Neural Networks, with Auto-Keras
  • ML visualizations with Seaborn and Lime

Usage with Python

Head over to the docs to see:

  • Basic usage example
  • More complex analysis and use cases
  • Integration with autosklearn
  • Integration with autokeras

Usage from frontend (not ideal)

You shold now be able to interact with the dataset through a simple server that is only running on my machine in the local network now. Fixing is happening anyhow so stay tuned. To test it yourself just try:

cd ackeras
$ python server.py

and head over to your localhost:5000. Upload a CSV and you should see something like this:

test

Be sure to tick (at this stage) the "Drop_rest", because it ensures that the data you push in and is not understood will be excluded. Then go ahead and submit query and head over to the link provided and enjoy everything breaking down. Keep an eye on the console because we tried and log most errors.

Other interesting libraries to add in the pipeline

  • Awesome Dash, python + react.js + flask
  • Bokeh, interactive web-plotting
  • Dask, multiprocessing with Pandas, Numpy and Sklearn