strand-aware closest
Opened this issue · 1 comments
gfudenberg commented
Example task:
Select CTCF peaks that are upstream of genes (in a strand-aware fashion). These peaks don't necessarily need to be on the same strand as the gene (hence would not be resolved just by adding the on=[] to
closest()
).
Solving this task currently requires multiple calls to closest.
Potential solutions:
- a convenience function that makes multiple calls to closest and merges the outputs.
- passing a column which defines the upstream/downstream relative to the genome. Default is relative to the reference genome (i.e. non-stranded closest).
Questions:
- if we pass this extra column, how to deal with a mix of stranded & non-stranded intervals? (where non-stranded is specified by '.': https://bioframe.readthedocs.io/en/latest/guide-specifications.html)
- what should this column be called?
directionality_col
,strand_col
,stream_col
gfudenberg commented
is this issue closed @agalitsyna ?