Instructor: Natalie Stanley
Email: natalies@cs.unc.edu
Time: Tuesday/Thursday 9:30am-10:45am, Spring Semester 2021
Office Hours: Thursday 11am-noon on our class zoom link and by appointment.
Location: Upate Jan 18: If you have not receieved my welcome email, please send me an email for the zoom link
Moden high-throughput assays allow us to efficiently profile a variety biological processes at a systems-level across a set of patient samples. As a result, these technologies generate an abundance of detailed information that needs to be extracted, analyzed and interpreted. In this course we will discuss the methodology used to analyze (process, engineer features from, combine, etc.) data generated by some of the most cutting-edge technologies, such as proteomics, single-cell assays, and imaging in biomedicine. We will further discuss how numerical linear algebra techniques and modern machine learning approaches can be applied to effectively extract information from these assays for an improved understanding of human health and disease. While computational biology is a very broad field, we will focus here on the analysis of data generated by single-cell technologies (e.g. mass cytometry), multiomics/multi-modality analysis, systems immunology, and benchmarking. For each class of algorithms introduced for some task on biological data, we will also go over necessary theory and mathematical intuition.
Strong programming. Comfortable with linear algebra and basic probabilty. Please do not worry if you don't have any background in biology. Any relevant concepts will be introduced. Please feel free to talk to me about any of these prerequistes.
This course will be mostly lecture-based with two homework assignments and a course project. I will provide ideas for several publicly available biological datasets and open problems for you to work on for these projects. Overall, the project is intended to give you an opportunity to implement/apply methodology discussed in the papers that we will discuss together. The final project writeup will also give you practice writing up results and communicating ideas. You are welcome to work on teams for this project.
Most of the lectures will be based around several papers. To benefit your own understanding, I will provide a set of questions that should be answered for one of the papers discussed in each lecture.
Note that this is preliminary. Some topics may take (on average) 1 day longer than planned. I reserve the right to correct typos in the notes up to 1hr before our class meeting.
Date | Topic | Reading | Notes | Code |
---|---|---|---|---|
Jan 19, 2021 | Intro, bioinformatics vs comp bio, challenges and modality-specific advancements | [Systems Immunology, Just Getting Started] | Lecture 1 Notes | |
Jan 21, 2021 | Linear Algebra Review, Low Rank Approximations, Building graphs from data, Graph Laplacian | [SLMP. pages 10-22], [Data Matrices + Low Rank], [Random Projection Trees],[LargeVis] no reading summary | Lecture 2 Notes | [LargeVis][graph tools for python] |
January 26, 2021 | Graph Partitioning | [Module Detection Benchmarking in Biological Data], [BigClam]. for fun: [Stochastic Block Model + Single Cell], | Lecture 3 Notes | [SNAP], [Louvain], [Leiden], [graph-tool (SBM)]. |
January 28, 2021 | Graph partitioning (overflow slides), Graph Embeddings, Graph Signal Procssing | [Node2Vec], [Representation Learning on Graphs], [Review: Graph Embedding in Comp Bio], for fun: [Review on GSP], [Low Pass Filtering on Graphs], [Vicus], [Mashup] | Lecture 4 Notes | [node2vec] |
February 2, 2021 | Single Cell Day 1: Intro to single-cell profiling, mass cytometry bioinformatics | [Single-Cells, Many Features], [Spade] | Lecture 5 Notes | [FCS file tutorial], [Spade] |
February 4, 2021 : [HW 1 Assigned] | Single Cell Day 2: Graph-based automated gating, imputation in single-cell data, branch-point preserving visualization | [phenograph], [PHATE], [MAGIC] | Lecture 6 Notes | [phenograph], [FastPG], [MAGIC], [Phate] |
February 9, 2021 | Single Cell Day 3: Feature Engineering from single-cell data and linking to external variables | Citrus, [MELD] | ||
February 11, 2021 | Single Cell Day 4: Differential Analysis of Cell-Populations | Diffcyt, Cydar | ||
February 16, 2021 | Wellness day no class | |||
February 18, 2021 : Homework 1 Due | Single Cell Day 5: Graph-based matching of single-cell data | Conos, LIGER | ||
February 23, 2021 | Single Cell Day 7: Guest lecture by Maria Brbic (Stanford CS) : Semi-Supervised Automated Cell-Population Discovery | MARS | ||
February 25, 2021 | Single Cell Day 6: Deep Learning for Single Cell Tasks | SAUCIE, CellCNN | ||
March 2, 2021 | Single Cell Day 7: Trajectory Inference | |||
March 4, 2021 | Single Cell Day 8: Benchmarking in Trajectory Inference | |||
March 9, 2021 : Project Proposals Due | Presentations of Project Propsals Day 1 | |||
March 11, 2021 | Wellness Day no class | |||
March 16, 2021 | Project Proposal Presentation Day 2 | |||
March 18, 2021 | Single Cell Day 9: Benchmarking in Single-Cell Analysis | Aghaeepour et al | ||
March 23, 2021 | Single Cell Day 10: Imaging Proteomics + Spatial Regularization : computational challeneges in combining tissue images and protein expression | |||
March 25, 2021 | Multiomics Day 1: Constructing a joint embedding of samples according to multiple modalities, subspace merging | SNF, grassmann embed | ||
March 30, 2021 | Multiomics Day 2: MOFA-1 and MOFA-2: Multiomics Factor Analysis | MOFA-1, MOFA-2 | ||
April 2, 2021 : HW 2 Assigned | Multiomics Day 3: Uncovering Relationships Between Modalities | mmvec | ||
April 6, 2021 | Multiomics Day 4: Stacked Generalization and CCA in multiomics studies | Ghaemi | ||
April 8, 2021 | Multiomics Day 5: Benchmarking in multiomics studies | |||
April 13, 2021 | Incorporating Prior Biological Knowledge into Analysis | |||
April 15, 2021 : Homework 2 Due | Systems Immunology Topic: TCR/BCR (T and B cell receptor reperotire analysis) | |||
April 20, 2021 | Partial Correlation, Thresholding etc for Identifying Meaningful Interactions | |||
April 22, 2021 | Enrichment Analysis, writing for an interdiscplinary audience | |||
April 27, 2021 | Project Presentations Day 1 | |||
April 29, 2021 | Project Presentations Day 2 | |||
Final Exam Day | Project papers due |
There will be two homework assignments to practice implementing particular concepts. Often, things can become a bit easier to understand and use when they are implemented by you. I will be happy to read/run code written in Python, R, Julia, or Matlab. Please submit your homework writeup as a PDF.
Most of what we discuss in class will come from papers. However, I suggest the following textbooks as background references. Conveniently, they are also available for free.
-
[PRML] Pattern Recognition and Machine Learning-- Chris Bishop [Link]
-
[SLMP] Spectral Learning on Matrices and Tensors -- Majid Janzamin et al. [Link]
-
The Matrix Cookbook [Link]
-
[PML] Probabilistic Machine Learning: An Introduction. -- Kevin Murphy [Link]
For each class, I will update the papers that we will go over in above table. You will only be required to write a summary of one of the potentially multiple papers assigned for that day.
Please choose one paper per week on the weeks when reading summaries are due and turn them in before our class meeting 9:30 am to natalies+comp790@cs.unc.edu.
-
Please explain in 2 sentences or less what the problem being solved is.
-
What were the main contributions of the authors in this work? (You can answer in a few bullet points).
-
Please describe 1-2 computational experiments that the authors implemented to test their method.
-
Were the authors the first to attempt this particular problem? If not, did they compare their results to other baselines? Do you think that their evaluation was objective?
-
Do you think that the authors provided enough evidence for why their developed method is an important contribution? If yes, please describe their reasoning here. If you do not think they adequately justified why they worked on this particular problem, please describe your thoughts on that here.
-
What is one follow-up idea or extension from this work?
I will provide you with several examples of publicly available biological datasets and problems (https://github.com/stanleyn/Comp790-166-Comp-Bio/blob/main/Datasets.md). Half-way through the semester, you will submit your project proposal and present your idea to the class. The proposal will be a short document describing 1) The problem 2) A background on other people's attempts to solve this problem and 3) A background on your idea of a solution and 4) the data you will use to test your method. At the end of the semester you will write a short paper explaining your method and results and present your results.
Grading will be based on the following
- Reading Questions : 20% over the entire semester
- Homework 1: 20%
- Homework 2: 20%
- Project Proposal : 10%
- Project final writeup: 30%
The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities. Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email ars@unc.edu.
(source: https://ars.unc.edu/faculty-staff/syllabus-statement)
I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I broadly define diversity to include race, gender identity, national origin, ethnicity, religion, social class, age, sexual orientation, political background, and physical and learning ability. I strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve, I appreciate suggestions.