clintval/sample-sheet

Experimental support SampleCollections and Loading_ID

clintval opened this issue · 1 comments

A SampleCollection is a container for samples which may have originated from multiple sample sheets / flow cells / lanes.

A SampleCollection will facilitate organizing samples by their Sample_Name or Library_ID. A few methods will help with merge strategies for identical samples that have either been topped-off (same library, sequenced on different flow cells or lanes) or re-prepared (different library, can exist on same flow cell or lane).

>>> from sample_sheet import SampleCollection
>>> collection = SampleCollection(samples)
>>> collection.visualize()
"""
collection(n=4)
├─ sample1
│  ├─ library1
|  │  ├─ loading1
|  │  └─ loading2
│  └─ library2
|     └─ loading1
└─ sample2
   └─ library1
      └─ loading1
"""

Grouping samples by loading returns a new collection. Samples that can be merged at this level will be equivalent (see L261-L265)

>>> collection = collection.group_by_loading(attr='Loading_ID')
>>> collection.visualize()
"""
collection(n=3)
├─ sample1
│  ├─ library1
│  └─ library2
└─ sample2
   └─ library1
"""

Grouping samples by library returns a final collection.

>>> collection = collection.group_by_library(attr='Library_ID')
>>> collection.visualize()
"""
collection(n=2)
├─ sample1
└─ sample2
"""

Cool thought, maybe another time.