Use an autoencoder neural network model to create a visually interpretable latent space.
- get hands on experience with building and training models with
pytorch
! - play with autoencoders
The dataset used here is the “PBMC3k” single-cell RNA-seq datasets; 3K Peripheral blood mononuclear cells from a healthy donor. Data was acquired from scanpy datasets. You will find the following related files in data
:
pbmc3k_raw_var_genes.tsv
: processed dataset keeping filtered cells and genespbmc3k_SeuratMetadata.tsv
: cell type labels
Notebooks should be executed in the order listed.
Collect_Datasets_and_Preprocess.ipynb
(already ran)Collect_Cell_Type_Labels.ipynb
(already ran)Train_Autoencoder_Tutorial.ipynb
(main notebook; this is what you will use to train your autoencoder)
To keep the notebooks easy to read, functions are stored in the scripts
folder. You will find:
autoencoders.py
: where we define some basic autoencoder model architectures in PyTorchtrain.py
: utility functions for training the autoencoder modelutils.py
: miscellanous utility functions includingvisualize()
for visualizing the autoencoder latent embedding layer
We will use NRNB compute resources for this workshop.
-
Start a JupyterLab instance.
-
Activate the prepared shared
conda
environment on NRNB (no installs needed). Then register the environment as a new kernel in your JupyterLab install.source activate /path/to/env python -m ipykernel install --user --name ml_env --display-name "ml_env"
-
Refresh the JupyterLab interface page. You should now be able to access the
ml_env
kernel for the notebooks. -
Let's get a copy of the repo:
git clone https://github.com/adamklie/CLAIM-scAEs.git