/auto-ml

Primary LanguageJavaScript

product-logo

Need

Not everyone has knowledge of ML and their models to recognize certain types of patterns within a huge dataset. To an ordinary user, it would be very difficult training a model over a set of data, providing it with an algorithm that it can use to reason over and learn from that data and also perform operations like preprocessing, imputation, visualization etc. A system to input data and get appropriate diagnosis along with an auto selected model would come in handy especially to the modern industrial sectors. The platform allows users who don't have any background knowledge of ML or its models and operations to predict and analyze data with ease.

Description

A platform that eases the work of data and business analysts in generating inferences from data without having knowledge of the coding side of things. It will provide complete data handling capabilities from ETL(Extract, Transform and Load) pipelines needed to build the dataset, automated data profiling and cleaning, analyzing how the data changes over time, improving the data quality. As the next step, neural networks would be automatically generated to perform classification or regression tasks on any target variable from the dataset. The platform will also create and suggest beautiful visualizations for the given dataset that can help drive decisions and understand the data at hand better.

Features

  • Allow the user to input a dataset in the form of CSV format.
  • Perform basic operations on the input dataset such as identification of columns, their data types, statistics like mean, min, max, grouping etc.
  • Perform data imputation for missing data for a given dataset.
  • Prepare the input dataset by applying various preprocessing techniques like handling outlier, one hot encoding, feature scaling etc
  • Develop an algorithm for Automatic Model Selection, using a genetic approach that automatically and efficiently finds the most suitable neural network model for a given dataset.
  • Develop an auto data visualization algorithm to show top k data visualization for a given dataset.

Application Screenshots

Home Page

product-home

Signup

product-signup

Login

product-signin

Dataset Input

product-datasetinput

Displaying all the datasets created by the user

product-datasetlist

Data Imputation

product-datasetimp

Dataset Details

product-datasetinfo1

Edit Dataset Details

product-datasetdesp

Dataset Catalog

product-datasetinfo2

Data Visualisation

product-visualisation

Model Selection

product-modelselection

After the model selection task is completed:

product-jobdetails

How To Use

Software Requirements

  • VSCode
  • MongoDB

Installation

Clone the repo

git clone https://github.com/deepanshu2506/auto-ml.git

Install the dependencies by running:

pip3 install -r requirements.txt
yarn install

Run using Command Prompt

flask run
cd ./frontend
yarn start

Tech stack

Frontend : React
Backend : Flask(Python)
Database : MongoDB