/Big-Data-Pipeline

Here is the code source for all the applications involved in our big data pipeline project

Primary LanguageTypeScript

Big data Pipline

Business Logic

The goal of this project is to establish a machine learning pipeline in order to create a unit helping a bank exacutive to decide wether an application for loan is acceptable or should be rejected

Model selection

in the model creation process a Logistic regression classifier and decision tree classifier were created and than saved for later used the one with higher precision

Pipelines architecture

Copy of architecture-design

technologies involved

  • Spark (mlib and spark sql)
  • Hadoop HDFS
  • Spring Boot
  • Angular
  • MongoDB

Features Completed

  • Batch process: Model generation
  • API for prediction requesting and prediction history
  • Front end appliction

Uncompleted projects

  • Streaming process

Missing configs in this repository and future features to add

  • Docker compose for containerizing the applications
  • the pipeline is missing the definition of the project were the data gathering and preprocessing happens from an original source

Reference

https://insatunisia.github.io/TP-BigData/