sanger-pathogens/circlator

fixstart stringency

rjsorr opened this issue · 1 comments

Hi, I want to run fixstart to start from the coi gene of the mitochondria. I have provided 40 coi genes in nucleotide format with a broad representation from my phylum of interest. On running fixstart it finds the coi gene (provided) for approx 10% of the input contigs (all contigs are complete mitochondria from within the same phylum and also therefore have coi) before defaulting to prodigal. Basically, I want it to start from the coi gene or nothing, even dropping prodigal if needs be. so questions.

  1. How do I lower the stringency so it finds the coi gene using the coi nt (database) sequences provided (what are good settings)? I have tried an array of --min_id and --mincluster options down to 10 for both with no real improvement (not easy to quantify with an input of 400 mitochondria). I could of course increase the size of the coi database provided but this seems defeatist. Is it not more logical to provide a protein sequence for stringency purposes?
  2. Can prodigal be turned off? or is it possible to provide the program with a choice of mt genes at a later date (most seem to choose coi as a default start gene for mitogenomes)?

cheers

Replying to myself, and after a bit a playing around, --min_id 25 and --mincluster 20 (default) was the least stringent for my dataset, but not easy to fully establish as results aren't constant. If I execute the same command multiple times the program will give different results. Conclusion is at least that the database needs to be larger... shame!

still need answer to the last part of 1 and 2.

regards