Merge singletons into larger clusters
Closed this issue · 2 comments
Compare singletons to larger clusters to identify those that were split off due to sequencing errors. The most benefit will be derived from singletons that are merged with other small clusters, it makes sense to focus on those to keep run time down.
Now that the consensus calling code has been wrapped in a package it should be possible to add this without too much overhead.
I would suggest adding a function to consensus.py that takes care of deciding whether or not two groups should be merged and some logic to drive the processing of the groups for merging to cmd_consensus.py. Now, post HealthHack I don't think there is a pressing need to write the fastq file to disk pre-merging. It seems better just to do that in memory and then write the final product out in the end.
@davidkohn, I wasn't quite sure how far you got with this on the day. The script doesn't look like it is writing any output (which is fine). Does it otherwise do everything you intended? Either way, I'm happy to port it over to the package, but I would avoid hooking it up to the CLI in the main branch if you'd rather do some more work on it.
I've started work on integrating the code to merge singletons into the package.