/Risk-Prediction-Project

In this project, we leverage cutting-edge data science techniques to forecast high-risk mortgage transactions. Our project is not just about predictive modeling; it's a comprehensive showcase of data visualization, predictive modeling, K-Nearest Neighbors (KNN), decision tree algorithms, and efficient data management practices.

RStudio_risk_prediction_project

BUAN 4310 PROJECT

Welcome to the BUAN 4310 Project! This project is a significant part of your course, designed to challenge your data mining skills and analytical abilities. In this readme, we will provide an in-depth overview of the project, its objectives, deliverables, expectations, and grading criteria.

Project Overview

In the BUAN 4310 Project, you will have the opportunity to work in groups of 2-4 students, simulating a consultancy team. Your mission is to tackle real-world business problems by leveraging data mining techniques to extract valuable insights. The project consists of two distinct problems: Problem 1 and Problem 2. Each group will be responsible for solving both problems. Let's take a closer look at these problems:

Problem 1

  • Data Set: Problem 1 involves a larger dataset with approximately 550,000 records and 27 variables. The data is provided in multiple files named loan_1.csv, loan_2.csv, and so on.
  • Nature: Problem 1 is more straightforward compared to Problem 2.
  • Deliverable: You are required to deliver your findings for Problem 1 in the form of either a video presentation or a deck of annotated presentation slides. These annotations should serve as your presentation script. Not every group member needs to speak during the presentation, but workload distribution should be equitable.

Problem 2

  • Data Set: Problem 2 features a smaller dataset.
  • Nature: Problem 2 is less straightforward compared to Problem 1.
  • Deliverable: For Problem 2, you will create a detailed report summarizing your analysis and findings.

Key Points to Note

  • Original Work: It's crucial to emphasize that all the work and analyses should be original and produced by your group. Plagiarism is strictly prohibited and will be dealt with according to the university's policies. While you can use external sources as references or guides, demonstrating your own data mining skills is paramount.

  • Intra-Group Dynamics: You will have the freedom to select your group members. Intra-group issues, such as workload distribution, commitment, or disputes, should be resolved internally. This course is at the 4000 level, and it's expected that you can work effectively with team members, even those you may not know well.

  • Peer Evaluation: At the end of the quarter, there will be a peer evaluation. Each group member will rate their fellow group members on a scale of 1 to 5, and the average score will contribute to each member's total project score.

  • Professionalism: All project deliverables should be prepared professionally. While grammatical and typographical errors are not the primary focus, the language should be professional, coherent, and easily comprehensible. You can utilize the writing center if necessary.

Problem 1 (22 points)

Problem 1 revolves around a scenario where Stark Enterprises has entered the financial industry, and they need assistance in identifying high-risk customers. Your task is to work with a substantial dataset and build appropriate models (at least 2) to address this problem.

Deliverable for Problem 1

Your group will present the findings for Problem 1 professionally. This presentation can take the form of a video recording or a deck of annotated presentation slides, along with a slide deck or HTML markdown that provides explanations. The presentation should include the following elements:

  1. Problem: Clearly define the problem you are addressing.
  2. Objective: Describe the project's objectives.
  3. Data Purpose: Explain why this data was collected initially.
  4. Data Exploration: Provide an overview of the data, including data types, scale, values, and data quality (missing values, outliers, etc.). Conduct basic descriptive analyses.
  5. Visualizations: Create at least 6 meaningful visualizations, explaining why you chose each one and what insights it provides.
  6. Relationships: Explore relationships between attributes through scatter plots, correlation, cross-tabulation, and more.
  7. Outcome Variable: Define and measure the outcome variable you are trying to predict.
  8. Models: Present at least 2 models, detailing their specifications and results.
  9. Final Model: Explain why you selected the final model.
  10. Recommendations: Provide recommendations based on your analysis and research on the problem domain.
  11. Sustainability: Consider the practicality of collecting these data in the long run and suggest additional variables if needed.

Problem 1 Rubric

Your group's performance on Problem 1 will be assessed based on the following criteria:

  • Data Visualizations (5 points): The quality and effectiveness of your visualizations in conveying insights.
  • Data Understanding (5 points): How well you understand the data and its implications for business.
  • Business Understanding (5 points): The depth of your discussion regarding the business problem and data's purpose.
  • Presentation Quality (4 points): The professionalism and coherence of your presentation.
  • Exceptional Work (3 points): Any exceptional work that goes beyond expectations.

Approximately 10% of the grade is allocated to truly exceptional work, which includes substantial effort, additional explorations, and manipulations.

Conclusion

The BUAN 4310 Project is a challenging but rewarding opportunity to apply your data mining skills to real-world business problems. Your success in this project will depend on your group's ability to work collaboratively, analyze data effectively, and communicate your findings professionally. We encourage you to embrace this learning experience and make the most of it.

If you have any questions or need guidance throughout the project, don't hesitate to reach out to your instructor or TA. Good luck, and may the data be with you!