clintval/sample-sheet

Appetite for looking up samples by Sample_ID

Opened this issue · 1 comments

nh13 commented

I wanted to gauge your appetite for being able to lookup samples by Sample_ID.

  1. get_sample(id: str) -> Optional[Sample]

In the simple case where no lane is defined, it will return None if not found and the Sample if found. There is more than one sample with the same Sample_ID, then an exception is raised. This could be caused by the samples having different Library_IDs or having different Lane values. So that leads to a second function:

  1. get_samples(id: str, library_id: Optional[str] = None, lane: Optional[str] = None) -> Optional[List[Sample]]

This one is more complicated, as it depends on if Library_ID and lane are specified in the function call and samples. But fundamentally, it will return None if there is no sample(s) with exactly the values given, otherwise all the samples with the exactly the values given.

My biggest concern is how to implement this. If we store a samples map (e.g. self._samples: Dict[Tuple[str, Optional[str], Optional[str]]] for Sample_ID, Library_ID, and Lane), then how does this get updated if Sample is changed? Alternatively, we can just use the iterator and filter, which isn't so bad as there's typically not too many samples, and if folks want something fast they can create this map themselves.

I am sure folks will want to have a map on different key (e.g. folks sometimes use Sample_Name as a unique key, but that's not strictly valid), but that's out of scope.

Thoughts?

I regret the overwhelming mutability of everything in Python.

I do think more ergonomics around grabbing samples by ID is a great idea!