/RStatMod

Prediction and classification modelling through RStudio

Primary LanguageRMIT LicenseMIT

Regression Analysis and Results Visualization

Prediction and classification modelling through RStudio

This project focuses on performing regression analysis and visualizing the results using RStudio. Regression analysis is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. It helps in understanding and predicting the impact of independent variables on the dependent variable.

The purpose of this project is to showcase the application of regression analysis in a practical context and demonstrate how to interpret and visualize the results using RStudio. By using various regression techniques, we aim to uncover relationships, make predictions, and gain insights into the data. Project Features

Perform data preprocessing: Cleaning, handling missing values, and transforming variables if necessary.
Explore and analyze the dataset: Examine the distribution of variables, identify outliers, and assess correlations.
Build regression models: Utilize different regression techniques such as linear regression, logistic regression, or polynomial regression.
Assess model performance: Evaluate the accuracy and goodness of fit of the regression models.
Visualize results: Generate visualizations such as scatter plots, line plots, or bar graphs to interpret and communicate the regression results effectively.

Getting Started

To get started with this project, follow the steps below:

Install RStudio on your machine (if not already installed).
Clone or download the project repository from GitHub.
Open the project in RStudio, running .Rmd is suggested for best exmaination of code style. If Rmd is not installed, the auto-generated .R file delievers the same output.
Modify the path to the 'Data' folder as necessary in line 41 (for .R file) or line 63 (for .Rmd file)
Implement the regression models using the appropriate techniques based on your project requirements.
Evaluate the performance of the methods and code styles.
The results and analysis are contained in the PDF attached with the files.

Notably, the best performing models for both prediction and classifiction are submitted to a community kaggle competition held by the course convenor. Amongst the teams participated, the final prediction model (Bagging regression trees) achieved an MSE of 0.303, achieving a rank of 1/38 in the competition. Additionally, the final classification model (Bagging regression trees) achieved a rank of 23/35 with a final rating of 0.927.