This class provides an introduction to applied data science skills needed by bioinformatics professionals. A focus will be placed on reproducible bioinformatics research and will include the following topics and tools: beginning to intermediate use of the Unix command line, working with remote computing resources, version tracking, R and Bioconductor, tools for manipulating sequence data, and creation of pipelines.
- Instructor: Randall Johnson, PhD
- Office Hours: In-person office hours will be held Thursdays immediately after class, and online office hours will be held Monday evenings from 8:30 to 9:30 PM. During online office hours, the Blackboard discussion thread titled "Office Hours" will be actively monitored.
- Prerequisites: BIFX 503
- Textbook: Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools 1st Edition, by Vince Buffalo, O'Reilly Media (2015)
- Communications: All course communications will be posted
on Blackboard. In order to receive timely notifications, it is
recommended that you do one or more of the following:
- Check Blackboard often
- Set your Blackboard email notifications to alert you when something is posted
- Download the phone app and enable push notifications (this may not be the best option this term, as the app was just released and seems to be a little limited).
On completion of this course, students should be comfortable with the following:
- Use of the Unix command line to manipulate data and perform bioinformatic analysis tasks
- Logging into and using remote computing resources
- Working with version controlled code repositories in a collaborative work environment
- Use of R and Bioconductor to perform bioinformatic analysis tasks
- Stitching a series of commands and/or programs together into a reusable pipeline
In addition to weekly reading assignments, Blackboard modules containing instructional vignettes will need to be viewed. These modules will be followed by a short quiz to guage class understanding prior to class. Students will be given a score for each quiz, but only participation will be tracked for the purpose of grading (i.e. if you complete both the module and the quiz, full points will be awarded for grading purposes).
Grades will be based on completion of homework, in-class participation, and two exams.
- Homework - 30%
- In-class participation - 30%
- Mid-term - 20%
- Final exam - 20%
In the event of severe weather resulting in the closure of Hood College and the cancellation of a regularly scheduled class, the material from the missed class will be posted on blackboard, and at least two live chat sessions will be held to work through material and answer questions.
Reading assignments are from Buffalo's Bioinformatics Data Skills unless otherwise specified, and they should be read prior to class. More details on reading assignments will be given on Blackboard.
Week | Topics | Reading | |
---|---|---|---|
1 | Aug 24 | Class intro Unix command line |
|
2 | Aug 31 | Intro to R | Ch 8 selections |
3 | Sep 7 | R Scripting flow control |
Ch 8 selections |
4 | Sep 14 | Advanced R topics | |
5 | Sep 21 | Project organization Git |
Ch 2 Ch 5 selections |
6 | Sep 28 | Markdown Advanced Git |
Ch 5 selections |
7 | Oct 6 | Advanced Unix tricks | Ch 7 |
8 | Oct 12 | Mid Term Exam | |
9 | Oct 19 | Bioinformatics data | Ch 6 |
10 | Oct 26 | Genomic Ranges | Ch 9 |
11 | Nov 2 | FASTA and FASTQ data | Ch 10 |
12 | Nov 9 | Sequence alignment | Ch 11 |
13 | Nov 16 | Shell scripting | Ch 12 |
Nov 23 | Thanksgiving! | ||
14 | Nov 30 | Pipelining with Snakemake | |
15 | Dec 7 | Containers Review |
|
16 | Dec 14 | Final Exam |