Overlapping intervals in single bed
Closed this issue · 2 comments
Hi,
I've got single bed file (made from merging lot of bed files) like this:
chr1 13332 13701 sample1 0 + 13332 13701 255,128,128
chr1 13338 13695 sample2 0 + 13338 13695 255,128,128
chr1 13330 13710 sample3 0 + 13330 13710 128,179,255
chr1 13320 13690 sample4 0 + 13320 13690 128,179,255
My goal is to find overlapping region with 90% for both regions and merge them into single one to finally get the output like this:
chr1 13320 13710 merged_4_samples 0 + 13320 13710 255,128,128 sample1, sample2, sample3, sample4
Or something like this: (that is finding overlapping intervals and adding a column with their id. After I can merge rows to get the widest range)
chr1 13332 13701 sample1 0 + 13332 13701 255,128,128 sample1, sample2, sample3, sample4
chr1 13338 13695 sample2 0 + 13338 13695 255,128,128 sample1, sample2, sample3, sample4
chr1 13330 13710 sample3 0 + 13330 13710 128,179,255 sample1, sample2, sample3, sample4
chr1 13320 13690 sample4 0 + 13320 13690 128,179,255 sample1, sample2, sample3, sample4
I tried coverage function, but it needs 2 inputs.
df = bf.coverage(df1, df2)
df = df[ ( df["coverage"] / (df["end"]-df["start"]) ) >=0.50]
Could you please tell me if such a thing is even possible? Do there always have to be 2 inputs?
Thanks in advance for your help.
Best,
Anna
If I understood you correctly, this could be the starting point for you: https://bioframe.readthedocs.io/en/latest/api-intervalops.html#bioframe.ops.cluster
resolving for now, but feel free to re-open if you have additional questions!