/MATLAB-Clustering-and-Classification-Projects---Numerical-Methods

Explore insightful projects on data analysis with MATLAB: k-means, k-medoid, and LDA. Polished PDF reports generated using LaTeX showcase valuable insights from diverse datasets. Discover the power of numerical methods in extracting knowledge from data!

Primary LanguageMATLABGNU General Public License v3.0GPL-3.0

This repository showcases MATLAB-based projects focusing on custom implementations of clustering and classification algorithms for various datasets. Each project combines rigorous mathematical methodologies with comprehensive MATLAB coding to address real-world classification challenges, covering techniques such as K-means, K-medoids, and clustering evaluations. You can find the full description on the pdf file.

๐Ÿ“‚ Project Overview

  1. Iris Dataset Clustering with K-means and K-medoids
    • Objective: Classify the well-known Iris dataset into three species using custom implementations of K-means and K-medoids algorithms.
    • Methodology: The K-means and K-medoids algorithms were implemented from scratch. K-means minimizes within-cluster variance, while K-medoids (using the L1 distance) reduces sensitivity to outliers by choosing actual data points as medoids.

Description of the image

  • Evaluation: Results are evaluated using confusion matrices and misclassification counts across multiple runs, highlighting each algorithmโ€™s stability and accuracy.

Description of the image

  1. Breast Cancer Biopsy Data Analysis with K-medoids
    • Objective: Cluster biopsy data to distinguish between benign and malignant samples, focusing on maximizing accuracy and robustness.
    • Methodology: Missing data entries are handled, and K-medoids clustering is employed, using sensitivity and specificity metrics to assess the methodโ€™s effectiveness in identifying malignant cases.

Description of the image

  • Evaluation: Sensitivity and specificity scores reveal the clustering accuracy, making it a practical diagnostic tool. The algorithm is tested across multiple runs to ensure robust results.
  1. 1984 Congressional Voting Records Analysis
    • Objective: Investigate partisan voting behavior in the 1984 U.S. Congress by clustering representatives based on their voting patterns.
    • Methodology: A dissimilarity matrix was computed using a custom dissimilarity index for "yes"/"no" votes, while managing missing votes by assigning a neutral score. K-medoids clustering is then applied to group representatives by voting alignment.
    • Evaluation: Confusion matrices reveal a clear partisan split, showing the effectiveness of K-medoids in political data clustering. Additional analysis assesses the voting consistency within each cluster.

Description of the image

  1. Wine Classification Using Chemical Properties
    • Objective: Classify Italian wines from three cultivars based on 13 chemical attributes, comparing the performance of K-means and K-medoids algorithms.
    • Methodology: Both K-means and K-medoids algorithms are applied to the wine dataset, with clusters evaluated for accuracy against the true wine cultivars.

Description of the image

  • Evaluation: Confusion matrices are generated to identify clustering accuracy. This project highlights the clustering challenges posed by similar chemical profiles across cultivars and compares the resilience of each algorithm.

Description of the image

  1. Cardiac SPECT Data Clustering for Patient Classification
    • Objective: Classify cardiac patients as normal or abnormal using binary data from SPECT images.
    • Methodology: Dissimilarity matrices are created using binary metrics, followed by K-medoids clustering to separate patients based on diagnostic markers.
    • Evaluation: Confusion matrices and a custom 2x2 classification matrix evaluate the clustering accuracy, reflecting how well attributes correspond to patient health status.

Description of the image

๐Ÿ”ง Technology Stack

  • Language: MATLAB
  • Techniques: K-means clustering, K-medoids clustering, Confusion matrix analysis, Sensitivity and specificity scoring, Robustness testing
  • Datasets: Iris, Biopsy, Congressional Voting, Wine, and Cardiac SPECT datasets

๐Ÿ’ก Key Highlights

  • Custom Implementations: All algorithms are coded from scratch to deepen understanding of clustering mechanics.
  • Robust Evaluations: Each project includes evaluation metrics like confusion matrices and error analyses to measure clustering quality.
  • Data Insights: These projects emphasize understanding algorithm strengths and limitations, particularly in handling outliers and noisy data.

๐Ÿ“Š Visualizations

  • Each project is accompanied by detailed plots and figures, including:
    • Cluster assignments with centroid/medoid visualizations
    • Confusion matrices for classification accuracy
    • Performance graphs illustrating algorithm convergence

๐Ÿš€ Getting Started

To explore these projects:

  1. Clone this repository.
  2. Open MATLAB and navigate to the project folder.
  3. Run each script to reproduce the results, visualizations, and metrics outlined above.
git clone https://github.com/yourusername/MATLAB-Clustering-Classification.git
cd MATLAB-Clustering-Classification