Interfaces
Opened this issue · 1 comments
For docs purposes...
While different data sources have different information available and access it in different ways, it would be good to converge on a minimal subset which could be extended so people know approximately what they're getting.
We discussed using CloudVolume's types for Skeleton
and Mesh
outputs; possibly subclasses to add extra information where we have it (e.g. a CatmaidSkeleton
which optionally includes a skeleton ID and an array of treenode IDs).
I have chosen, where it could go either way, to enforce taking an iterable of e.g. skeleton IDs rather than taking either one or multiple. Supporting both is a development headache as more interfaces get built on top, where putting a fn([arg])
costs the user nothing and its necessity is immediately clear from type annotations etc.. The return is, therefore, also always an iterable - I am leaning towards lazy iteration where it makes sense (e.g. where each skeleton is a separate request) but it could just be annotated as Iterable
.
My signatures for connectivity (expressed with DataFrameBuilders, a utility I added here https://github.com/navis-org/connectomes/pull/2/files#diff-5db922df8581ff8e33d10f319f95ae35a99391be48a02b61319812f53d1e3685R231 for building up dataframes row-wise) are:
class ConnectivitySource:
def get_edges(self, source_skeleton_ids: list[int], target_skeleton_ids: list[int]) -> pd.DataFrame:
builder = DataFrameBuilder(
["source_skeleton_id", "target_skeleton_id", "count"],
["uint64", "uint64", "uint32"],
)
def get_partners(self, skeleton_ids: list[int]) -> pd.DataFrame:
builder = DataFrameBuilder(
["skeleton_id", "partner_id", "count", "is_outgoing"],
["uint64", "uint64", "uint32", "bool"],
)
def get_synapses(self, skeleton_ids: list[int]) -> pd.DataFrame:
builder = DataFrameBuilder(
["skeleton_id", "x", "y", "z", "is_outgoing"],
["uint64", "float64", "float64", "float64", "bool"],
)
Adding the partner to get_synapses
would also be useful, it just wasn't in the first relevant CATMAID endpoint I saw...
On different information per data source: yes that's definitely something to consider. My minimal subset would be:
- Meshes (if available)
- Skeletons (if available)
- Synaptic partners (up- and downstream) as CATMAID-like table
- Edges between given sources and targets
- Synapses themselves - i.e. x/y/z coordinates plus type (pre or post)
- Finding neurons by some search string
- Fetching meta data (annotations) for given neurons
- Query image data (if available)
- Query segmentation data (if available)
The above should mostly work for the datasets I can think of. Importantly, the function/method signature should look similar and the output should be normalised somehow. For example, I would try consistently using id
instead of root ID
, body ID
and skeleton ID
.
On Skeleton
and Mesh
subclassing: yes, very much in favour of that solution. We could then also make it easier to construct navis neurons from these cloudvolume objects.
On the input parameters: I think requiring and returning lists is reasonable for now. That said: I'm also lazy and being able to pass just a single ID would be great in the future.