open2c/bioframe

Merging based on unrelated intervals possible?

Closed this issue · 2 comments

Hi. Thanks for this excellent package. I'm wanting to do a merge that is dependent on a second, unrelated set of intervals. If both the r- and q- intervals will merge, then go ahead, but if only one of them would merge, do nothing. I think this might be possible with bioframe, but I haven't found the right recipe yet. Have you encountered this use case?

With min_dist=100:

Don't merge:
qstart qend rstart rend
2756014 2756066 54079 54131
55662 55787 54096 54221

Merge:
qstart qend rstart rend
54358 54543 54629 54814
54147 54332 54840 55025

Hi @marade -- I think this might be possible with bioframe.cluster: first cluster on the qstart+qend, then cluster with cols rstart+rend and the cluster ids from qstart+qend as input for the 'on' argument.

Let us know if this works!

It sounds like it should work. I found a different way to do it and became busy with other matters, so I don't need bioframe for this now. I'll go ahead close this issue with your suggestion, and hopefully if someone else needs the same thing they'll see it. Thanks!