beliveau-lab/OligoMiner

Does the outputClean LDA model generalize to other species?

Closed this issue · 4 comments

Hi,

Do you anticipate that the trained LDA model in outputClean will work well for non-human species? I am testing it on some mouse probes and receiving strange results. Particularly, for probes that bowtie2 marks as having several alignments, outputClean passes 100 percent of the candidates when I would expect fewer to pass. This is regardless of how I set the p flag (even -p 1 which seems strange). Maybe there are some caveats I need to be aware of when using the model with outputClean. I'm attaching my input fasta file in case that sheds some light.

Thanks!
mouse.txt

Hi Brian,

Thanks for your help! Does the model actually use the chrom:start-stop information as features for the classification, or only as a way to parse the sam file? I tried it again with ">Tubb3 chr8:123411553-123422015" but I guess I'm still using the wrong header. Looking at the accession on NCBI, https://www.ncbi.nlm.nih.gov/nuccore/NC_000074.6?report=genbank&from=123411553&to=123422015, I thought that my header was correct, sorry if this is outside the scope, but do you know the best way for me to find the correct "chrom:start-stop" metadata when retrieving new sequences?

Best,
David

Ok, I'm able to get it working, thanks!