/DA350

Advanced Methods for Data Analytics

Primary LanguageHTML

Introduction

This project encompasses the work done in the course DA350: Advanced Methods for Data Analytics. It includes various labs focusing on practical application and analysis using data analytics methods, and a community-based data challenge aimed at contributing to the Etna Township's comprehensive plan.

Table of Contents

  1. Introduction
  2. Labs
  3. Final Project: Investigation of Factors for Credit Card Default Prediction
  4. Usage

Labs

Lab 6: Decision Boundary Analysis

In this lab, a comparative analysis of decision boundaries generated by different classification algorithms like KNN, Logistic Regression, LDA, Decision Tree, Bagging, Random Forest, Boosting, and SVM was conducted.

Lab 7: Recommender Systems

This lab entails the creation of a movie Recommender System from Nextflix database, exploring methodologies used in real-world systems like Amazon, Spotify, and Netflix to improve user experiences across platforms.

Lab 9: Facial Recognition

This lab explores facial recognition technology, including image rotation, eigenface computation for dimensionality reduction, and face ownership identification from a database of known individuals.

Data Challenge: Etna Township Comprehensive Plan

Engaged by the Etna Township Trustees, a Community Advisory Committee was constituted to aid in the development of the township's upcoming comprehensive plan. A public survey was orchestrated to accumulate a diverse range of opinions on current services, future development, and other pivotal issues concerning the Etna Township's evolution.

Final Project: Investigation of Factors for Credit Card Default Prediction

In the final project, the focus is on predicting the likelihood of credit default based on a dataset encompassing various client features including demographics, credit limit, and payment history. Three predictive models - Decision Tree, K-Nearest Neighbors (kNN), and Logistic Regression - are constructed and compared to identify the key factors influencing credit default risk. The performance and limitations of each model are thoroughly assessed to provide insights into their efficacy in predicting credit default.

Usage

The repository contains the documentation and results of the projects in both HTML and PDF format for easy viewing and sharing. Here's how you can access them:

Online Viewing

  • You can view the HTML files directly on GitHub by navigating to the respective project directory.
  • The PDF files can be downloaded and viewed on any PDF reader.

Local Viewing

  1. Clone the repository to your local machine:
git clone https://github.com/diii99/DA350.git