No output in PlaScope_predictions
nickp60 opened this issue · 2 comments
Hi,
I am trying to test PlaScope (verison 1.3) using this O104:H4 assembly (stored as O104H4.fasta
) and the provided database (stored in ~/dbs/plascope/
) , and am getting the following warnings:
$ plaScope.sh --fasta ~/2018-08-selecting_plasmid_finder/O104H4.fasta -o test --db_dir ~/dbs/plascope --db_name chromosome_plasmid_db --sample name_of_my_sample -n
Mode 2
Step 1: Contigs classification with Centrifuge and custom database
Centrifuge log can be found here: test/name_of_my_sample_PlaScope/Centrifuge_results/centrifuge.log
Step 2: Extraction of plasmid, chromosome and unclassified predictions
Warning: >CP003289.1 Escherichia coli O104:H4 str. 2011C-3493, complete genome not classified.
Warning: >CP003291.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pAA-EA11, complete sequence not classified.
Warning: >CP003290.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pESBL-EA11, complete sequence not classified.
Warning: >CP003292.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pG-EA11, complete sequence not classified.
If you use PlaScope please cite: ...
Here are the contents of test/name_of_my_sample_PlaScope/Centrifuge_results/name_of_my_sample_extendedresult
:
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
CP003292.1 NC_022740.1 3 1290496 0 1151 1549 1
CP003290.1 species 3 1761963080 0 54346 88544 1
CP003291.1 species 3 376760648 0 42917 74217 1
CP003289.1 NZ_CP025401.1 2 3187944114 0 224663 5273097 1
And the contents of test/name_of_my_sample_PlaScope/Centrifuge_results/name_of_my_sample_summary
:
name taxID taxRank genomeSize numReads numUniqueReads abundance
2 2 species 1722383768 1 1 0.0
3 3 species 241331578 3 3 0.0
Once it finishes, the PlaScope_predictions
subdirectory is empty. How can I check the predictions for each of the contigs? Why are these warnings being thrown?
Thanks in advance!
Hi nickp60!
The current version of PlaScope required SPAdes-formated header (e.g. ">NODE_36_length_43824_cov_77.8425"). This allows us to sort contigs according to their coverage (SPAdes coverage > 2), which is generally more relevant as low-coverage contigs are frequently low quality or contaminated contigs.
However as shown in the "name_of_my_sample_extendedresult" file the three plasmids (CP003292.1, CP003290.1, CP003291.1) are correctly classified (value in third column = 3) as well as the chromosome (CP003289.1, value in third column = 2).
I plan to propose a version independent of the contigs format as soon. But for now if you use contigs with a not suitable format you will get these warnings and contigs will not be extracted.
Hi @GuilhemRoyer, thanks for the update! I will be SPAdes input for the actually analysis, so that actually works out well; though it might be nice to have the option to turn of this behaviour for use with non-SPAdes input. Thanks for clearing this up!