/gdayf-core

DayF (Decision at your Fingertips) is an AutoML opensource project

Primary LanguagePython

DayF Core Development Framework

DayF (Decision at your Fingertips) is an AutoML GPL3 opensource development framework that let developers works with Machine Learning models without any idea of AI, simply taking a .csv dataset and the objective column.

gDayF Framework make all transformations (Normalization, cleaning, etc ) and choose the best model and parametrization selection for you stroing all dataset and model execution parameters in a .json file.

Getting Started

Clone Git repository: https://github.com/e2its/gdayf-core.git

Prerequisites

Create a virtual env (gdaf-core):

  • python (3.7)
  • activate gdayf-core

Install package dependencies:

  • pip install h2o==3.30.0.1
  • pip install pyspark==2.4.5
  • pip install pandas
  • pip install hdfs
  • pip install pymongo

Docker images for ML frameworks and mongodb

  • e2its/ubuntu-spark:2.4.5
  • e2its/ububtu-h2o:3.30.0.1
  • e2its/mongodb:latest

Define storage parameters [Configuration can be changed on config.json]:

  • MongoDB: installed on 0.0.0.0:27017:

    • "mongoDB": { "value": "gdayf-v1", "url": "0.0.0.0", "port": "27017", "type":"mongoDB", "hash_value": null, "hash_type":"MD5" }
  • HDFS:

    • "hdfs": {"value": "/gdayf-v1/experiments" , "type":"hdfs", "url":"http://0.0.0.0:50070", "uri":"hdfs:/<<namenode_ip>>:8020", "hash_value": null, "hash_type":"MD5" }
  • LocalFS:

    • "localfs": {"value": "/Data/gdayf-v1/experiments" , "type":"localfs", "hash_value": null, "hash_type":"MD5" }
  • Define primary path to be used:

    • "primary_path": "localfs"
  • Establish different levels of storage based on Storage engines configured:

    • "load_path": [ {"value": "models" , "type":"localfs", "hash_value": null, "hash_type":"MD5" } ]
    • "log_path" : [ {"value": "log" , "type":"localfs", "hash_value": null, "hash_type":"MD5" } ]
    • "json_path" : [ {"value": "json" , "type":"mongoDB", "hash_value": null, "hash_type":"MD5" } ]
    • "prediction_path" : [ {"value": "prediction" , "type":"mongoDB", "hash_value": null, "hash_type":"MD5" } ]

Documentation

A doxygen graphviz technical documentation can be located on doc folder in the project

Running the tests

Test.py scripts can be found on test/src folder in the project

Built With

  • H2o.ai - a Machine Learning engine working on Hadoop/Yarn, Spark, or your laptop.
  • Apache Spark MLlib - is a fast and general-purpose cluster computing for machine learning.
  • mongoDB - NoSQL, Json based database.
  • Apache HDFS - is a distributed file system designed to run on commodity hardware.
  • Pandas - is an open source Python Data Analysis Library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Authors

  • Jose L. Sanchez del Coso - e2its - Linkedin

LICENSE

This project is licensed under the GPL3 License - see the LICENSE.md for details