czbiohub-sf/dashit

remove GGGGGGGGGGGGGGGGGGGG

emilydc opened this issue · 1 comments

Hi David - could you please alter optimize_guides so that it IGNORES guides with the sequence GGGGGGGGGGGGGGGGGGGG? This is not real, it's what the sequencer reads when there is a short read and it gets all the way to the end and has nothing else to read. It often appears in the top 200 or so guides, and so far I have removed it manually, but we should just avoid it entirely. Thanks!

After thinking about this, dashit-reads-filter will already do this: by default it'll filter out any site that has a homopolymer with > 5 consecutive nucleotides. You can adjust this via the --homopolymer command line option.