/dsbd

Data Science & Big Data

Primary LanguageJupyter Notebook

Data Science and Big Data

Repo for work in "Data Science and Big Data" with Raja Sooriamurthi.

This course included lectures, homework, and projects covering topics such as:

  • Pandas, numpy, seaborn, matplotlib
  • Scipy, scrapy, beautifulsoup
  • Statistical analysis and data visualization
  • Machine learning tools and methods
    • Scikit-learn
    • cross-validation
    • sentiment analysis
    • recommenders
    • classification, regression, clustering
  • Genetic algorithms
  • Apache Spark
  • MapReduce
  • Apache Pig

Projects were self-directed, and were intended to exercise our newfound skills and provide practice at turning data into useful insights. The two main projects were:

  • Project 1, about how cost-effective a CMU education is (video here).
  • Project 2, which covers a supervised learning classification task to identify failing water pumps in rural Tanzania (video here).