/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

Primary LanguageJupyter NotebookOtherNOASSERTION

Practical Data Science

The focus of this document is on using R for data processing, programming, modeling, visualization, and presentation of results. It contains exercises for additional practice, and most of the content has been translated to Python and is available via Jupyter notebooks.

link

Outline

Part 1: Information Processing

  • Understanding Basic R Approaches to Gathering and Processing Data

    • Overview of Data Structures
    • Getting data in and out
    • Indexing
  • Getting Acquainted with Other Approaches to Data Processing

    • Pipes, and how to use them
    • tidyverse
    • data.table
    • Misc.

Part 2: Programming Basics

  • Using R more fully

    • Dealing with objects
    • Iterative programming
    • Writing functions
  • Going further

    • Code style
    • Vectorization
    • Regular expressions

Part 3: Modeling

  • Model Exploration

    • Key concepts
    • Understanding and fitting models
    • Overview of extensions
  • Model Criticism

    • Model Assessment
    • Model Comparison
  • Machine Learning

    • Concepts
    • Demonstration of techniques

Part 4: Visualization

  • Thinking Visually

    • Visualizing Information
    • Color
    • Contrast
    • and more...
  • Using ggplot2

    • Aesthetics
    • Layers
    • Themes
    • and more...
  • Adding Interactivity

    • Package demos
    • Shiny

Part 5: Presentation

  • Building Better Data-Driven Products
    • Reproducibility concepts
  • Starting out with R markdown
    • Standard documents
  • Customization and more
    • Themes, CSS, etc.