/bankMarketing

Bank Marketing Data Set from UCI

Primary LanguageJupyter Notebook

UCL Bank marketing

  • Created: August 26, 2021 12:55 AM
  • Last Edited Time: August 6, 2022 1:01 AM
  • Status: In Progress 🙌
  • Type: Technical Spec

Summary

This aim of this project is to build a simple credit model using bank data from UCI [1]. The goal is to get to a good model which runs end-to-end and play with some considerations as if we where to put it into production. This is a fun exercise and is more about skills development than "real ML" so we do not focus on the modelling but more the thought process behind building it.

Data source

The original data is Bank Marketing Data Set from UCI. This was added to this git repo in order to create an end-point to query throughout development.

Background

What is the motivation for these changes? What problems will this solve? Include graphs, metrics, etc. if relevant.

This was originally a case study given in an unsuccessful interview, but I enjoyed working on it so after gaining more experience, I have chosen to return to it a year later.

This is a binary classification with very imbalanced data.

Goals

What are the outcomes that will result from these changes? How will we evaluate success for the proposed changes?

  • Deal with class imbalance (SMOTE, undersampling, oversampling)

Non-Goals

To narrow the scope of what we're working on, outline what this proposal will not accomplish.

  • Split into Train-Validation-Test (our data is too small for this)
  • Containerize our application

Proposed Solution

Describe the solution to the problems outlined above. Include enough detail to allow for productive discussion and comments from readers.

  • API that takes through data and outputs scores.
graph TD;
  A-->B;
  A-->C;
  B-->D;
  C-->D;

Notebooks

Highlight risks so your reviewers can direct their attention here.

graph TD
    A[Exploratory Data Analysis `EDA`] -->|Gain some direction| B(Feature Engineering)
    B --> C(Resample)
    C -->|To deal with class imbalance| D[Feature Selection]
    D --> E[Model training]
    E -->F[Obtaining predictions]
    F -->G[Scoring]

Milestones

Break down the solution into key tasks and their estimated deadlines.

Open Questions

Ask any unresolved questions about the proposed solution here.

Follow-up Tasks

What needs to be done next for this proposal?

References:

[1] UCI Machine Learning Repository Bank Marketing Data Set

[2] When is resampling beneficial for feature selection with imbalanced wide data?