Data Science Basics for Scientists in Parks

Overview

This workshop is built from the Carpentries lessons and designed for the Scientists in Parks program. The Carpentries' aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis, and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools.

The workshop uses a single tabular data set that contains observations about adorable small mammals over a long period of time in Arizona.

Setup and installation

Overview of the lessons:

  • Data organization in spreadsheets and data cleaning with OpenRefine
  • Introduction to R
  • Data analysis and visualization in R

Detailed structure

Day 1: Data organization & cleaning

There are two lessons in this section. The first is a spreadsheet lesson that teaches good data organization, and some data cleaning and quality control checking in a spreadsheet program.

The second lesson uses a spreadsheet-like program called OpenRefine to teach data cleaning and filtering

Day 2: Data analysis & visualization

These lessons include a basic introduction to R syntax, importing CSV data, and subsetting and merging data. It finishes with calculating summary statistics and creating simple plots.