/DSE_Statistical_Learning_Project

This project is part of the Statistical Learning course.

Primary LanguageHTML

DSE Statistical Learning Project

This project explores the application of both unsupervised and supervised learning techniques in analyzing Women’s Tennis Association (WTA) data.

Overview

The project aims to provide insights into WTA data using a combination of unsupervised and supervised learning methods.

Key Components:

👉 Presentation: Provides an overview of the project and its findings, offering a high-level perspective.

👉 Report: Offers a detailed analysis of the methodologies, results, and interpretations.

👉 Notebook: Contains the R code used for analysis, allowing for transparency and reproducibility.

Methodology

The unsupervised aspect involves clustering the top 30 players, while the supervised part focuses on building predictive models for match outcomes. The dataset comprises a variety of player performance metrics, attributes, and match characteristics.

Techniques Used:

  • Unsupervised Learning: Utilizes methods such as principal component analysis (PCA), k-medoids, and hierarchical clustering to uncover player segments and patterns within the data.
  • Supervised Learning: Employs logistic regression, classification tree, and random forest models to predict match outcomes based on player and match attributes.

Keywords

Women Tennis Association (WTA), Hierarchical Clustering, K-means, PCA, Classification Tree, Random Forest, Logistic Regression.