Make nbl.smat.LookupNdBuilder more generic
clbarnes opened this issue · 2 comments
At present it's annotated to take Dotprops and functions which operate on Dotprops. Synblast doesn't use Dotprops but would benefit from a LookupNdBuilder, and I don't think the builder actually uses any dotprops functionality as it just uses the passed-in function to operate on them. Any object with length N and any function which takes 2 of those objects to produce an array of length N should work, so this may just be a case of adding a generic type annotation.
Requested by @swilson27
Happy to pick this up if you don't have the bandwidth at the moment, @clbarnes.
At first glance, the only problem appears to be that LookupNdBuilder
uses the length of individual neurons to calculate the n_matching_qual_vals
variable. See the len(self.dotprops[q_idx])
in this method, for example:
def _get_pairs(self):
matching_pairs = list(set(self._yield_matching_pairs()))
# need to know the eventual distdot count
# so we know how many non-matching pairs to draw
q_idx_count = Counter(p[0] for p in matching_pairs)
n_matching_qual_vals = sum(
len(self.dotprops[q_idx]) * n_reps for q_idx, n_reps in q_idx_count.items()
)
nonmatching_pairs = self._pick_nonmatching_pairs(n_matching_qual_vals)
return matching_pairs, nonmatching_pairs
If I understand this correctly, you're using it to match the number of points (i.e. size) rather than the number of neurons between matching and non-matching pairs? Currently only Dotprops
implements a __len__
method though. In addition, the similarity metric might not be using whatever __len__
measures (synBLAST being an example for that).
I can see three solutions:
- Implement
__len__
methods for all neuron types. - Use
len(object)
if possible and, failing that, fall back to the total number - Let the user define a function for that. Default would be
len_func=len
but could also be e.g.len_func=lambda x: len(x.connectors)
for synBLAST orlen_func=lambda x: 1
for when we don't care about size, only counts.
I'd favour option 3. Thoughts?
@swilson27 I pushed some changed to the generic_lookupnd
branch. I haven't tried this with syNBLAST yet but perhaps you want to take it for a spin?