Recommended architecture for high-dimensional data

Question

Recommended architecture for high-dimensional data

manuelknott opened this issue 3 years ago · 3 comments

I hope it is fine to ask a more conceptual question.

My data has 151296 dimensions (no images), which is a bit more than a 224x224 image.

I couldn't even make the simple flows shown in the examples work with that, due to memory issues. Is there any way to deal with such a high-dimensional input? If not, what are the limits in your experience?

Thanks in advance!

Answer 1 · 2022-06-21T09:53:30.000Z

I think that this repository is abandoned

Answer 2 · 2022-06-22T13:50:05.000Z

It is very difficult to work directly with such high dimensional data, unless you know something about the structure of your data that allows you to reduce the dimensionality in come way (e.g. using convolution in images). The other option is to first reduce the dimensionality of your data, e.g. by PCA or some other dimension reduction algorithm. In my experience, it is difficult to train fully-connected flows with more than a couple of thousand dimensions.

Answer 3 · 2022-06-22T15:02:47.000Z

From our experience, people use all kinds of techniques to reduce the dimensionality of the problem.

The most popular are:

use a CNN (e.g. on images or any dense representation where neighborhoods carry important information) using unsupervised or self-supervised techniques.
Another very common approach is to train a (V)AE on your data using a MSE loss or equivalent (depending on what features you want to prefer) in an unsupervised fashion. When you want to use the flow, you cut the AE in half and use the latent space as embedded encoding.