2021_Statistical-Modelling

[R | Multiple Regression | Logistic Regression | Time series]

Welcome to the "Statistical Modelling" repository! This project contains all the reports, data, and code from the course "Statistics for Data Analytics" of the MSc in Data Analytics. The repository is divided into two folders, each focusing on different statistical modelling techniques.

Folder 1: Multiple Regression Analysis

Abstract: The Multiple Regression Analysis folder contains insights into the criteria that dictate the amount of credit card debt based on the financial behavior of 687 customers. This report provides a comprehensive understanding of multiple regression models as a powerful tool for predicting numerical values using a set of other variables. The final model presented in the report is a parsimonious subset of four variables, which explains 73% of the variance in the dependent variable. The model adheres to all Ordinary Least Square assumptions, ensuring its validity and reliability.

Folder 2: Time Series and Logistic Regression

Abstract: The Time Series and Logistic Regression folder explores the application of time series analysis and logistic regression methods to two different datasets. The analysis delves into the appropriate use of time series analysis techniques, identifying the SARIMA (Seasonal Autoregressive Integrated Moving Average) model as the optimal choice. Additionally, regular logistic regression models are employed for the analysis, demonstrating their effectiveness in contrasting with a model utilizing Principal Component Analysis (PCA) techniques.

Repository Contents:

  1. Multiple Regression Analysis Folder:
  • Reports: Detailed reports providing insights into the criteria influencing credit card debt based on financial behavior.
  • Data: Dataset containing information on 687 customers used for multiple regression analysis.
  • Code: R scripts showcasing the implementation of multiple regression models and the validation of Ordinary Least Square assumptions.
  1. Time Series and Logistic Regression Folder:
  • Reports: Comprehensive reports outlining the application of time series analysis and logistic regression models.
  • Data: Datasets used for time series analysis and logistic regression tasks.
  • Code: R scripts demonstrating the implementation of SARIMA models and logistic regression models.

How to Explore:

  1. Access the repository to explore the reports, data, and code related to statistical modelling in the field of data analytics. Navigate to the Multiple Regression Analysis folder to gain insights into the criteria affecting credit card debt and the implementation of multiple regression models.
  2. Review the provided reports and datasets to understand the methodologies and findings of the multiple regression analysis.
  3. Examine the code scripts to observe the implementation of multiple regression models and the validation of Ordinary Least Square assumptions.
  4. Explore the Time Series and Logistic Regression folder to delve into the application of time series analysis and logistic regression methods.
  5. Refer to the reports, datasets, and code scripts within the Time Series and Logistic Regression folder to gain a comprehensive understanding of the analysis and findings.

This repository serves as a valuable resource for individuals interested in statistical modelling techniques in data analytics. By providing reports, datasets, and code scripts, this project enables exploration and understanding of multiple regression analysis, time series analysis, and logistic regression. Join us on this statistical modelling journey and uncover the insights that statistical techniques can offer in the realm of data analytics.