TODO Future projects
mmp2 opened this issue · 0 comments
mmp2 commented
Write to @jmcq89 or @mmp2 if you would like to contribute
This list is only very slightly prioritized (i.e we think the first two tasks are the most important currently). There are almost no dependencies between tasks, so any task can be undertaken at any time.
- lazy R-metric evaluation (small to moderate) (Xiao Wang, UW)
- selecting the neighborhood radius (moderate to large) (@jmcq89)
Requires: notions of high dimensional geometry, local weighted PCA, writing some visualization tools
Resources: matlab code already written, can be "directly" ported to python (Potential for a publication too) - directed graph embedding (moderate or possibly small, if no visualization is included)
This is a fun project, especially if you add the specific visualization tools that would make the results shine. There exists a matlab implementation and a published paper. All the tools needed are already in megaman. - Riemannian relaxation (moderate) Matlab code exists @yuchaz
- Principal curves and surfaces after Ozertem and Erdogmus JMLR (moderate if no scalability required, large otherwise) Matlab code exists
- applications of manifold learning to various data sets and problems (small, moderate or large). Below is a sample list.
- spectra of galaxies
- representations obtained by deep neural networks
- musical recordings
- brain activity recordings
- hand movement data (possibly other robotic data)
- outputs of MCMC runs
- BYOD
- applications related to the other tasks below, e.g GP regression, directed embedding, spectral clustering of networks
- Nystrom extension embedding new points into the existing coordinate system (moderate) (Xiao Wang)
Requires - linear algebra, some reading - dimension estimation (moderate to large)
This is more than an implementation task, although just implementing existing methods is a possibility. Best done in conjunction with reading research papers. High potential for resulting in a publication. - manifold represented by *patches *(large, probably a research project)
- implement distance and area computations (moderate)
Shortest path distances on the graph, corrected by the Rmetric. Some matlab code exists. Some independence and experimentation required as there are subtle aspects to this shortest path problem.
Area computation would be nice but is secondary, could be a separate project. - implement gaussian process regression on a manifold (large)
Matlab code exists. Good understanding of math and computational linear algebra necessary. Also some basics of machine learning, e.g semi-supervised learning; these could be acquired.
To investigate if one can use existing GP packages (george) or implement from scratch (using computational linear algebra tools) - spectral clustering for millions of points (moderate) (Xiao Wang)
Requires: using k-means (from sklearn), some understanding of spectral clustering (there are tutorials), and of k-means. (Possible extension, not done yet: build a small library of similarity functions.) - k-means initializations K-log K initialization, kmeans++ (Hui Pang)
- Visualization tools (some are related to various tasks above) - small if otherwise noted
- covar_plotter3 a 3D covar_plotter to display the R-metric with 3D embeddings
- locally isometric visualization (rescale the data so that R-metric is identity at one fixed point, display it)
- display a vector field on a manifold
- display a point cloud without outliers