Getting Started with Data Science

A rough guide to getting started with data science

History

A Taxonomy of Data Science

  • Obtain
  • Scrub
  • Explore
  • Model
  • iNterpret

A Very Short History of Data Science

A Very Short History of Big Data

Introduction

Big Data vs. Analytics

  • Big Data is a description of an engineering problem
  • Analytics is a description of a way of generating value from data

Look at the question you are trying to answer

  1. The objective you are trying to drive (ex. maximize profits)
  2. The leaders (what are the things that I can actually change, that are going to impact the objective?)
  • Data Science is about the people, not the tools

Skills

  • Finding rich data sources.
  • Working with large volumes of data despite hardware, software, and bandwidth constraints.
  • Cleaning the data and making sure that data is consistent.
  • Melding multiple datasets together.
  • Visualizing that data.
  • Building rich tooling that enables others to work with data effectively.

(from https://www.kaggle.com/wiki/WhatIsDataScience - original source: http://radar.oreilly.com/2011/09/building-data-science-teams.html)

Tutorials

Kaggle's Tutorials