/DataScience

Resources for Data Science meetups at MEST

Primary LanguageJupyter Notebook

Doing Data Science at MEST

The main goal of this course is to introduce students to data science techniques that allows them to produce production-level data products that solve problems. What this means is that we will aim to incorporate and deploy our data products into a web or mobile application for users to interact with. I hope at the end of this course, students will be able to apply core data science principles to build data-driven products and organizations.

This is an introductory Data Science course aim to introduce students to a breadth of concepts in Data Science. The aim is to introduce the OSEMN(Obtain, Scrub, Explore, Model and INterpret) process of data science to students towards developing skills that foster data-driven thinking and products. This course is intended for anyone interested in learning more about the data science process and applying it to their everyday lives, projects or organisations.

Session Facilitator

Important Resources

  • Google+ MEST Data Science Community
  • Mailing list: data[at]meltwater.org
  • Session times: Fridays 1 - 3pm
  • Help Sessions: Mondays, Tuesdays & Thursdays, 6 - 7pm

Session Outline [Work in Progress]

These sessions will be made up of two main components:

  • Book discussion sessions

    The reading session will normally be during the first half hour, where we will discuss the data science book we are reading at the moment. This is to help build your qualitative and knowledge about questions and developments that are happening in the space.

  • Quantitative sessions

This will involve working on building the quantitative/technical skillset needed to do Data Science. This will span Python tutorials, statistics, Machine Learning, Visualisation amongst other. We will be using a combination of well-prepared courses, books and material provided by leading experts and organisation in the field.

Outline

Milestone 1: Core Skills

  • Outcome 1: Python installation

    • Install Essential libraries
  • Outcome 2: Intro to Python

  • Outcome 3: Intro to Data Science

  • Outcome 4: Intro to Statistics [SKIPPED]

    • Udacity Intro to Statistics
    • Personal Project: Answer a question of interest using statistics and data of your choice. Extra points for MEST-related projects.
  • Outcome 5: Team Formation & Specialisation

    • Choose one area of Specialisation
    • Choose teams for final team project. Must have at least one member from each specialisation.

Milestone 2: Specialisations

Milestone 3: Team Data Product

Prerequisites

Students interested in taking this course should be comfortable or eager to ask questions, experiment with new ideas and build products. A basic familiarity in using a Unix/Linux command line is recommended.

Data Science Toolbox

Mastering the Data Science process requires having a set of basic tools to process your data, test your hypothesis and extract meaning for insights. For this course, we will show examples primarily using Python and its libraries, and sometimes R, JavaScipt or Java for specific topics. Students who have suggestions or prefernces for other tools should feel free to use them and share with the community. The ball is in your court. See below for the tool set for this course:

Tools Python R JavaScript
IDE IPython RStudio X
Data Processing Pandas, Scipy, Numpy Core libraries X
Machine Learning & Data Mining scikit-learn Several R libraries X
Data Graphics Matplotlib, ggplot2 ggplot2 D3
Interactive Visualization Plotly Shiny D3, Dimple.js
Big Data Hadoop, Spark, Storm Read this X
Web Development web2py, flask X MeteorJS

Hands-on Work

Reading List

  1. February-March: Doing Data Science

Projects

  • Personal Data Project
  • Team Data Project

Resources

Books

Articles

Videos

Data Sources

Blogs