czbiohub-sf/cerebra

how does utils.get_best_overlap deal with genome positions that have multiple hits

Closed this issue · 1 comments

  • this seems to be most of them...cosmic database has TONS of repeated shit
  • specifically the problem is with SNP calling in get_aa_mutations:
    - i want to be able to match CDSs against entries in the cosmic db in order to determine the AA substitution, but what happens if there are multiple cosmic entries, with multiple CDSs
    - utils.get_best_overlap is probably just returning the first matching entry, so the CDSs do not necessarily match

fixed! with utils.get_all_overlaps