cidm-ph/distmat

from_pw_distances_with from pre-calculated distances?

Closed this issue · 4 comments

Hi. Thanks for writing this nice crate. If I have pre-calculated distances in a vector of structs like so...

struct Dist {
a: String,
b: String,
distance: u32
}

Is it possible to use SquareMatrix::from_pw_distances_with to get matrix format? Or is there a better way to handle that?

Hi! SquareMatrix also implements FromIterator, so if your vector stores the Dists in row-major order then you should be able to construct the matrix with something like let matrix = dists.iter().map(|x| x.distance).collect() (EDIT: was .into()). If the order isn't guaranteed to be row-major, then you could build an index to use with from_pw_distances_with().

The tabular format parser internally already does essentially this, so it might be worth refactoring that into a constructor that handles this kind of layout more conveniently.

It turns out my data are not n*n but rather n * (n - 1) / 2, so I need to use DistMatrix, though I'd like to subsequently convert to SquareMatrix. Giving that a try:

error[E0277]: the trait bound DistMatrix<u32>: From<Map<std::slice::Iter<'_, Dist>, [closure@src/main.rs:321:53: 321:56]>> is not satisfied
--> src/main.rs:321:36
|
321 | let dmatrix: DistMatrix = dists.iter().map(|x| x.distance).into();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---- required by a bound introduced by this call
| |
| the trait From<Map<std::slice::Iter<'_, Dist>, [closure@src/main.rs:321:53: 321:56]>> is not implemented for DistMatrix<u32>
|
= note: required for Map<std::slice::Iter<'_, Dist>, [closure@src/main.rs:321:53: 321:56]> to implement `Into<DistMatrix>

Probably I'm just not seeing something simple with the invocation here.

That was my typo: it should have been collect() rather than into(). That still won't do any checking that the order of elements lines up with the order DistMatrix expects them to be stored in, so you might want to sanity check the matrix you build.

The next version of the crate will have constructors specifically for this scenario where you have labelled distances. It checks that the correct entries are provided, and can handle them being out of order. It currently works like this (but I will probably tweak it slightly):

let m = DistMatrix::<u32>::from_labelled_dists(dists.into_iter().map(|x| (x.a, x.b, x.distance))).unwrap();

Indeed, that solved it. Good to know about the alternate/upcoming ways to do it. Thanks for your help!