/Comp790-166-Comp-Bio

Computational Biology- Spring 2021

Primary LanguageTeX

Comp790-166: Computational Biology

Details

Instructor: Natalie Stanley

Email: natalies@cs.unc.edu

Time: Tuesday/Thursday 9:30am-10:45am, Spring Semester 2021

Office Hours: Thursday 11am-noon on our class zoom link and by appointment.

Location: Upate Jan 18: If you have not receieved my welcome email, please send me an email for the zoom link

Description

Modern, high-throughput assays allow us to efficiently profile a variety of biological processes to gain a systems-level understanding of health and disease. Recent technologies and experimental assays generate an abundance of detailed information that needs to be extracted, summarized, and interpreted. In this course we will discuss the methodology used to extract signal from (e.g. process, engineer features from, combine, etc.) data generated by some of the most cutting-edge experimental paradigms, such as single-cell assays and imaging. We will go into detail about the methods and theory underlying bioinformatics algorithms, originating from numerical linear algebra, graph-signal processing, and machine learning. While computational biology is a very broad field, we will focus here on applications in single-cell biology (CyTOF, single-cell RNA sequencing), multiomics/multi-modal analysis, systems immunology, and benchmarking. For each class of algorithms introduced for some task on biological data, we will also go over necessary theory and mathematical intuition. The course covers the foundations for biomedical data science and does not assume any biological knowledge.

Prerequistes

Strong programming. Comfortable with linear algebra and basic probabilty. Please do not worry if you don't have any background in biology. Any relevant concepts will be introduced. Please feel free to talk to me about any of these prerequistes.

Course Structure

This course will be mostly lecture-based with two homework assignments and a course project. I will provide ideas for several publicly available biological datasets and open problems for you to work on for these projects. Overall, the project is intended to give you an opportunity to implement/apply methodology discussed in the papers that we will discuss together. The final project writeup will also give you practice writing up results and communicating ideas. You are welcome to work on teams for this project.

Most of the lectures will be based around several papers. To benefit your own understanding, I will provide a set of questions that should be answered for one of the papers discussed in each lecture.

Schedule

Note that this is preliminary. Some topics may take (on average) 1 day longer than planned. I reserve the right to correct typos in the notes up to 1hr before our class meeting.

Date Topic Reading Notes Code
Jan 19, 2021 Intro, bioinformatics vs comp bio, challenges and modality-specific advancements [Systems Immunology, Just Getting Started] Lecture 1 Notes
Jan 21, 2021 Linear Algebra Review, Low Rank Approximations, Building graphs from data, Graph Laplacian [SLMP. pages 10-22], [Data Matrices + Low Rank], [Random Projection Trees],[LargeVis] no reading summary Lecture 2 Notes [LargeVis][graph tools for python]
January 26, 2021 Graph Partitioning [Module Detection Benchmarking in Biological Data], [BigClam]. for fun: [Stochastic Block Model + Single Cell], Lecture 3 Notes [SNAP], [Louvain], [Leiden], [graph-tool (SBM)].
January 28, 2021 Graph partitioning (overflow slides), Graph Embeddings, Graph Signal Procssing [Node2Vec], [Representation Learning on Graphs], [Review: Graph Embedding in Comp Bio], for fun: [Review on GSP], [Low Pass Filtering on Graphs], [Vicus], [Mashup] Lecture 4 Notes [node2vec]
February 2, 2021 Single Cell Day 1: Intro to single-cell profiling, mass cytometry bioinformatics [Single-Cells, Many Features], [Spade] Lecture 5 Notes [FCS file tutorial], [Spade]
February 4, 2021 : [HW 1 Assigned] Single Cell Day 2: Graph-based automated gating, imputation in single-cell data, branch-point preserving visualization [phenograph], [PHATE], [MAGIC] Lecture 6 Notes [phenograph], [FastPG], [MAGIC], [Phate]
February 9, 2021 Single Cell Day 3: Geometry Based Data Generation, Denoising, Data Augmentation + Linking Single-Cell Data to External Variables [SUGAR], [MELD] Lecture 7 Notes [Meld], [sugar]
February 11, 2021 Single Cell Day 4: Graph Fourier Transform , Low Pass Filtering, Finish up MELD, Differential Abundance Analysis of Cell Populations for fun: [Low Pass Filtering on Graphs] Lecture 8 Notes [GSP toolbox]
February 16, 2021 Wellness day no class
February 18, 2021 Class Canceled due to Weather
February 23, 2021 : Homework 1 Due, [Project Proposal Template] Single Cell Day 5: Guest lecture by Maria Brbic (Stanford CS) : Semi-Supervised Automated Cell-Population Discovery [MARS] No slides [MARS]
February 25, 2021 Single Cell Day 6: Differential Analysis of Cell Populations + Projecting Data According to Background Variance [Contrastive PCA], [Cydar], [Milo] Lecture 9 Notes [cPCA], [Milo], [cydar]
March 2, 2021 : Reading Summary 2 Due by Today Single Cell Day 7: Combinging Multiple Single-Cell Datasets [Conos], [CytofMerge], [Harmony], [SAUCIE], Lecture 10 Notes [Conos], [CyTOFMerge]
March 4, 2021 Single Cell Day 8: Combining Multiple Panels, Starting Pseudotime and Mapping Cellular Differentiation [Diffusion Maps for Differentiation], [SLICER-developed at UNC], [Original Diffusion Maps (Coifman)] Lecture 11 Notes [Diffusion Maps -Scanpy], [SLICER]
March 9, 2021 : Project Proposals Due, Sign up for a presentation slot! Combining Multiple Modalities for a common set of patients [Subspace Merging on Grassmann Manifold], [Rayleigh Ritz Business (Spectral Clustering...] Lecture 12 Notes [Grassmann Cluster]
March 11, 2021 Wellness Day no class
March 16, 2021 Presentations of Project Propsals Day 1
March 18, 2021 Project Proposal Presentation Day 2
March 23, 2021 Finish Grassmann Embedding, Longitudinal Multimodal Data Integration [Longitudinal Multiomodal Data Integration on ADNI] Lecture 13 Notes
March 25, 2021 Finish Longitudinal Multiomodal Data Integration, Brief Journey in Convex Optimization, ADMM [ADMM by Stephen Boyd] Lecture 14 Notes [CVX]
March 30, 2021 MOFA methods for multiomics integration (both multiple modalities and multiple single-cell!) [MOFA], [MOFA+] Lecture 15 Notes [MOFA]
April 1, 2021 Integrating multiple heterogeneous graphs (e.g. multiple relational definitions). Start network alignment. [Mashup], [REGAL (network alignment)] Lecture 16 Notes [REGAL]
April 6 2021 Refining Graph Alignments, Graph Summarization [Refining Network Alignment], [Bridging Network Alignment and Summarization] Lecture 17 Notes [RefiNA]
April 8, 2021 : [HW 2 Assigned] Harmonic alignment for multimodal single cell data, graph neural networks vs label propagation + correction [Harmonic Alignment + Single Cell Multimodal], [Learning on Graphs- Correct and Smooth] Lecture 18 Notes [Harmonic Alignment]
April 13, 2021 : Computational Neuroscience + Time Varying Analysis of Brain Connectivity [review on chronnectome], [edge functional connectivity], [Edge Communities + Brain Connectivity], [Edge-Level Features], [Predictive Subnetwork Extraction] Lecture 19 Notes
April 15, 2021 Imaging Proteomics and Genomics + Spatial Regularization [LEAPH], [MIBI + TB granuloma], [SpiceMix] [Histocat], Lecture 20 Notes
April 20, 2021 [Project Presentation Sign-up], [Project write-up template, TeX available in Projects Folder too] Sketching Single-Cell and Biological Datasets [Geometric Sketching], [Hopper] Lecture 21 Notes [Geometric Sketching], [Hopper]
April 22, 2021 : Homework 2 Due on April 23 Technical Writing in Comp Bio, Opportunities for Learning on Graphs in Biology, Summary of What we Covered [Representation Learning for Networks in Biology], [Watch - How to Be a Machine Learning Biologist by Quaid Morris] Lecture 22 Notes
April 27, 2021 : Project Presentations Day 1 [Final Rubric]
April 29, 2021 Project Presentations Day 2
May 4, 2021 Project Presentations Day 3
May 11 2021 Project papers due

Homework, Project, Reading, Grading, Etc

Homework

There will be two homework assignments to practice implementing particular concepts. Often, things can become a bit easier to understand and use when they are implemented by you. I will be happy to read/run code written in Python, R, Julia, or Matlab. Please submit your homework writeup as a PDF.

Background Resources

Most of what we discuss in class will come from papers. However, I suggest the following textbooks as background references. Conveniently, they are also available for free.

  • [PRML] Pattern Recognition and Machine Learning-- Chris Bishop [Link]

  • [SLMP] Spectral Learning on Matrices and Tensors -- Majid Janzamin et al. [Link]

  • The Matrix Cookbook [Link]

  • [PML] Probabilistic Machine Learning: An Introduction. -- Kevin Murphy [Link]

Readings

For each class, I will update the papers that we will go over in above table. You will only be required to write a summary of one of the potentially multiple papers assigned for that day.

Reading Questions

Please choose one paper per week on the weeks when reading summaries are due and turn them in before our class meeting 9:30 am to natalies+comp790@cs.unc.edu.

  1. Please explain in 2 sentences or less what the problem being solved is.

  2. What were the main contributions of the authors in this work? (You can answer in a few bullet points).

  3. Please describe 1-2 computational experiments that the authors implemented to test their method.

  4. Were the authors the first to attempt this particular problem? If not, did they compare their results to other baselines? Do you think that their evaluation was objective?

  5. Do you think that the authors provided enough evidence for why their developed method is an important contribution? If yes, please describe their reasoning here. If you do not think they adequately justified why they worked on this particular problem, please describe your thoughts on that here.

  6. What is one follow-up idea or extension from this work?

Final Project

I will provide you with several examples of publicly available biological datasets and problems (https://github.com/stanleyn/Comp790-166-Comp-Bio/blob/main/Datasets.md). Half-way through the semester, you will submit your project proposal and present your idea to the class. The proposal will be a short document describing 1) The problem 2) A background on other people's attempts to solve this problem and 3) A background on your idea of a solution and 4) the data you will use to test your method. At the end of the semester you will write a short paper explaining your method and results and present your results.

Grading

Grading will be based on the following

  1. Reading Questions : 20% over the entire semester
  2. Homework 1: 20%
  3. Homework 2: 20%
  4. Project Proposal : 10%
  5. Project final writeup: 30%

Accessibility Statement

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities. Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email ars@unc.edu.

(source: https://ars.unc.edu/faculty-staff/syllabus-statement)

Diversity Statement

I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I broadly define diversity to include race, gender identity, national origin, ethnicity, religion, social class, age, sexual orientation, political background, and physical and learning ability. I strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve, I appreciate suggestions.