navis-org/connectomes

Interfaces

Opened this issue · 1 comments

For docs purposes...

While different data sources have different information available and access it in different ways, it would be good to converge on a minimal subset which could be extended so people know approximately what they're getting.

We discussed using CloudVolume's types for Skeleton and Mesh outputs; possibly subclasses to add extra information where we have it (e.g. a CatmaidSkeleton which optionally includes a skeleton ID and an array of treenode IDs).

I have chosen, where it could go either way, to enforce taking an iterable of e.g. skeleton IDs rather than taking either one or multiple. Supporting both is a development headache as more interfaces get built on top, where putting a fn([arg]) costs the user nothing and its necessity is immediately clear from type annotations etc.. The return is, therefore, also always an iterable - I am leaning towards lazy iteration where it makes sense (e.g. where each skeleton is a separate request) but it could just be annotated as Iterable.

My signatures for connectivity (expressed with DataFrameBuilders, a utility I added here https://github.com/navis-org/connectomes/pull/2/files#diff-5db922df8581ff8e33d10f319f95ae35a99391be48a02b61319812f53d1e3685R231 for building up dataframes row-wise) are:

class ConnectivitySource:
    def get_edges(self, source_skeleton_ids: list[int], target_skeleton_ids: list[int]) -> pd.DataFrame:
        builder = DataFrameBuilder(
            ["source_skeleton_id", "target_skeleton_id", "count"],
            ["uint64", "uint64", "uint32"],
        )

    def get_partners(self, skeleton_ids: list[int]) -> pd.DataFrame:
        builder = DataFrameBuilder(
            ["skeleton_id", "partner_id", "count", "is_outgoing"],
            ["uint64", "uint64", "uint32", "bool"],
        )

    def get_synapses(self, skeleton_ids: list[int]) -> pd.DataFrame:
        builder = DataFrameBuilder(
            ["skeleton_id", "x", "y", "z", "is_outgoing"],
            ["uint64", "float64", "float64", "float64", "bool"],
        )

Adding the partner to get_synapses would also be useful, it just wasn't in the first relevant CATMAID endpoint I saw...

On different information per data source: yes that's definitely something to consider. My minimal subset would be:

  1. Meshes (if available)
  2. Skeletons (if available)
  3. Synaptic partners (up- and downstream) as CATMAID-like table
  4. Edges between given sources and targets
  5. Synapses themselves - i.e. x/y/z coordinates plus type (pre or post)
  6. Finding neurons by some search string
  7. Fetching meta data (annotations) for given neurons
  8. Query image data (if available)
  9. Query segmentation data (if available)

The above should mostly work for the datasets I can think of. Importantly, the function/method signature should look similar and the output should be normalised somehow. For example, I would try consistently using id instead of root ID, body ID and skeleton ID.

On Skeleton and Mesh subclassing: yes, very much in favour of that solution. We could then also make it easier to construct navis neurons from these cloudvolume objects.

On the input parameters: I think requiring and returning lists is reasonable for now. That said: I'm also lazy and being able to pass just a single ID would be great in the future.