Research Computing Center, University of Chicago
November 17, 2016
2:00 pm - 4:00 pm
Instructor: Peter Carbonetto
Helpers: Will Graybeal
Register here.
Following from Analysis of Genetic Data Part 1, in this 2-hour workshop we will use PLINK and R to generate interesting biological insights from large-scale genetic data. We will also use online databases such as the UCSC Genome Browser to interpret the output of the data analyses. Although no background in genetics is required to follow the examples, participants with exposure to concepts in modern quantitative genetics will be in a better position to benefit from this workshop. Since we cannot work with human data due to data sharing restrictions, we will download and investigate data from a mouse genetics study.
Level: Intermediate
Prerequistes: This workshop assumes some experience performing simple tasks in a UNIX-like shell environment, as well as basic familiarity with R. Participants must be able to log in to the RCC compute cluster, although experience using the RCC cluster is not required. All participants must bring a laptop with a Mac, Linux, or Windows operating sytem that they have administrative privileges on. Note: Attending Part 1 is not required for the second part.
Where: Kathleen A. Zar Room, John Crerar Library, University of Chicago (OpenStreetMap).
Additional info: This workshop is an attempt to apply elements of the Software Carpentry approach (see also this article) to interactive instruction for computing/quantitative sciences.
Please also take a look at the Code of Conduct, and the Software License which applies to all the scripts and code examples in this repository. All instructional material contained in this repository is made available under the Creative Commons Attribution license (CC BY 4.0).
-
Explore the application of numeric techniques for identifying the genetic factors that contribute to a measured trait.
-
Understand how large-scale genetic data sets are commonly represented in computer files.
-
Use command-line tools to manipulate genetic data.
-
Use R to summarize and visualize the results of a genetic data analysis.
-
Practice using the RCC shell environment (midway) for large-scale data processing and analysis.
Episode | Concepts |
---|---|
1. Setup | How do I set up my shell environment on midway for an analysis of genetic data? |
2. Inspecting the data | How are genetic data commonly represented in computer files? How can I use standard shell commands to explore and summarize genetic data? |
3. Mapping genetic associations | How do I prepare genetic data for mapping associations? How do I use PLINK to assess support for genetic associations? |
4. Visualizing and interpreting the association analysis | How can I use R to visualize and interpret the PLINK results? How can I cross-reference these results against genetic databases to understand the biological significance of these results? |
5. Refining the association analysis | What are some factors that could invalidate an association analysis? How can we use R to look for common problems in phenotype data? How can we use R to manipulate data for an improved association analysis? |