#How to Best Visualize a Dataset Easily
#Overview
This is the code for this video by Siraj Raval on Youtube. The human activities dataset contains 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. The data set is downloaded from here. This code downloads the dataset, cleans it, creates feature vectors, then uses T-SNE to reduce the dimensionality of the feature vectors to just 2. Then, we use matplotlib to visualize the data.
##Dependencies
- pandas(http://pandas.pydata.org/)
- numpy (http://www.numpy.org/)
- scikit-learn (http://scikit-learn.org/)
- matplotlib (http://matplotlib.org/)
Install dependencies via 'pip install'. (i.e pip install pandas).
##Usage
To run this code, just run the following in terminal:
python data_visualization.py
##Challenge
The challenge for this video is to visualize this Game of Thrones dataset. Use T-SNE to lower the dimensionality of the data and plot it using matplotlib. In your README, write our 2-3 sentences of something you discovered about the data after visualizing it. This will be great practice in understanding why dimensionality reduction is so important and analyzing data visually.
##Due Date is December 29th 2016
##Credits
The credits for this code go to Yifeng-He. I've merely created a wrapper around the code to make it easy for people to get started.