mphilli/r-news-clustering

Data Mining - Google News (R)

R

Exploring a news corpus in R

This repository houses the data files and code for my final project in IST 565 - Data Mining.

The R programming language was used to perform data analysis on a self-collected set of 151 news articles from Google News.

The news corpus was analyzed using:

Word clouds based on token frequency

Association rules to discover important relationships among words

k-means clustering to explore article similarity
- For this task, each article was assigned a unique number, which could be explored using a 2D plot (using PCA for dimensionality reduction) and a cluster dendrogram.