During clathrin-mediated endocytosis (CME), clathrin surrounds molecules awaiting transport, forming a spherical coat. Our goal was to pick out clathrin undergoing this process. This repository employs semi-supervised learning methods to classify "cup-like" clathrin structures given STORM microscopies for proteins of interest. See the problem formulation and approach specifics in our presentation slides or full report.
The clathrin data was provided by the Ke Xu lab in UC Berkeley's College of Chemistry, whose research work we are supporting. If you find this work useful for your research, please consider citing:
@citation{storm,
Author = {Alvin Wan and Allen Guo},
Title = {Semi-Supervised Deep Learning for Molecular Structures},
Year = {2017}
}
This project requires Python3. We begin by navigating to the root of the repository, which we will call $STORM
.
cd $STORM
(optional) We recommend setting up a virtual environment first. This project uses Python3.
virtualenv ../env --python=python3
source ../env/bin/activate
Install all Python requirements.
pip install -r requirements.txt
Alternatively, you can toy with various hyperparameters and attempt training on your own. We approached the problem using a two-step pipeline. First, find a latent representation in a lower-dimensional space. Then, run a simple classifier on the encoded data.
If your data is located at
data/train_molecules.mat
anddata/test_molecules.mat
, the<data_class>
mentioned below would bemolecules
.
Start by picking a featurization technique.
cd $STORM
bash storm.sh encode_(ae|kmeans|pca) <data_class>
We then train a support vector machine (SVM) using the featurizations. For the below command, make sure to featurize both the train.mat
and test.mat
datasets, specified above.
bash storm.sh svm <data_class>