nomic-ai/nomic

Visualize non-Euclidean embeddings

Closed this issue · 3 comments

ez2rok commented

I'm trying to visualize 128 dimensional embeddings in hyperbolic space using Atlas. However, I noticed that Atlas's create_index function includes the line

'nearest_neighbor_index_hyperparameters': json.dumps({'space': 'l2', 'ef_construction': 100, 'M': 16})

in the build_template here. This line uses the l2 distance while I have to use the hyperbolic distance function as my embeddings are in hyperbolic space. I know your dimensionality reduction algorithm is closed source, but I was wondering

  • Is there a way to specify a custom distance function when creating embeddings? This could be a helpful feature for users looking for more customization.
  • Is the l2 space used when performing dimensionality reduction or is it perhaps only used for nearest neighbor search?

Any answers to this would be greatly appreciated. Thanks!

There is not currently a way to specify a custom distance function and there most likely won't be one in the short term--but it couldn't hurt to just try the high-d hyperbolic space and see if it looks OK?

ez2rok commented

I tried visualizing 128 dimensional points on the Poincare model of hyperbolic geometry. The data possesses some structure but not a ton. It is unclear if this is because the dimensionality reduction violates the assumptions of the manifold or if the data is just inherently noisy. So to answer your question, the data does look OK but that is not entirely satisfactory for me.

Also, I just wanted to clarify: does the dimensionality reduction algorithm assume we are in Euclidean space? Or is it just the nearest neighbor search which assumes we are in Euclidean space?

Thanks!

Yes, the dimensionality reduction assumes a kernel that is Euclidean