hairy-octo-hipster
Data & Code for the Global Big Data Conference R Workshop
Conference Speaker URL : http://globalbigdataconference.com/43/santa-clara/big-data-bootcamp/speaker-details/567/krishna-sankar.html
Slides at Slideshare http://goo.gl/2rijLz
Abstract :
A hands-on workshop with R, Data Science, Algorithms and interesting data. We might even try couple of Kaggle competitions.
Outline :
- Intro, Goals, Logistics, Setup [10] [9:00-9:10)
- Introduction to R [30] [9:10-9:40)
- Who will win Superbowl XLIX ? The Art of ELO Ranking [30] [9:40-10:10)
- Anatomy of a Kaggle Competition [40] [10:10-10:50)
- Competition Mechanics
- Register, download data, create sub directories
- Trial Run : Submit Titanic
- Break [20] [10:50-11:10)
- Algorithms for the Amateur Data Scientist [20] [11:10-11:30)
- Algorithms, Tools & frameworks in perspective
- “Folk Wisdom”
- Model Evaluation & Interpretation [30] [11:30 - 12:00)
- Confusion Matrix, ROC Graph
- Questions/Discussions
Requirements for the attendees:
This will be a hands-on workshop. So would be most benefitial if you can come prepared.
-
Install R
-
Install R Studio
-
Clone this github repository (and update before going to bed on Saturday ;o) for the latest files)
-
Download data from Kaggle. I cannot distribute Kaggle data.
- Setup an account in Kaggle (www.kaggle.com)
- We will be using the data from 2 Kaggle competitions
- Titanic: Machine Learning from Disaster
- Download data from http://www.kaggle.com/c/titanic-gettingStarted
- Directory ~/hairy-octo-hipster/titanic-r
- Predicting Bike Sharing @ Washington DC
- Download data from http://www.kaggle.com/c/bike-sharing-demand/data
- Directory ~/hairy-octo-hipster/bike
- Titanic: Machine Learning from Disaster
- We also will be using the data 2014 NFL Boxscore
- Download data from http://www.pro-football-reference.com/years/2014/games.htm
- Directory ~/hairy-octo-hipster/nfl