SilasK/Krak

Error in rule braken

animesh opened this issue ยท 10 comments

I am trying to run the workflow and facing this issue
2021-08-07T100724.612348.snakemake.log
, more specifically, one of the fastq spec log S2_QUALITY_PASSED_species.log says "IndexError: string index out of range", any ideas how to get past this?

My config looks like
config.zip where the specific changes made is db_name : "Kraken_dbs/UHGG" and genome_folder: fasta, not sure if this is relevant here?

Samples table
samples.zip

Check the kraken_results/db_name/reports/S2_QUALITY_PASSED.txt does it look like a kreken report. My guess is that i's empty. Maybe something during the kreken run went wrong.

The rest seems all ok.

Yes, looks like all of them are empty? Should i just rerun the pipeline? Is there something else i can check before going that route?


(kraken) animeshs@DMED7596:/mnt/z/Kraken$ ls kraken_results/db_name/reports/*
kraken_results/db_name/reports/S20_2_QUALITY_PASSED.txt  kraken_results/db_name/reports/S33_QUALITY_PASSED.txt  kraken_results/db_name/reports/S46_QUALITY_PASSED.txt
kraken_results/db_name/reports/S23_QUALITY_PASSED.txt    kraken_results/db_name/reports/S35_QUALITY_PASSED.txt  kraken_results/db_name/reports/S47_QUALITY_PASSED.txt
kraken_results/db_name/reports/S27_QUALITY_PASSED.txt    kraken_results/db_name/reports/S36_QUALITY_PASSED.txt  kraken_results/db_name/reports/S48_QUALITY_PASSED.txt
kraken_results/db_name/reports/S28_QUALITY_PASSED.txt    kraken_results/db_name/reports/S37_QUALITY_PASSED.txt  kraken_results/db_name/reports/S5_QUALITY_PASSED.txt
kraken_results/db_name/reports/S2_QUALITY_PASSED.txt     kraken_results/db_name/reports/S3_QUALITY_PASSED.txt   kraken_results/db_name/reports/s13_QUALITY_PASSED.txt
kraken_results/db_name/reports/S30_QUALITY_PASSED.txt    kraken_results/db_name/reports/S40_QUALITY_PASSED.txt
kraken_results/db_name/reports/S31_QUALITY_PASSED.txt    kraken_results/db_name/reports/S44_QUALITY_PASSED.txt
(kraken) animeshs@DMED7596:/mnt/z/Kraken$ cat kraken_results/db_name/reports/*txt

Check the "log/kraken/{db_name}/{sample}.log"

They seem to be complaining about gzip?

gzip: /mnt/z/ayu/reads/S20-2-QUALITY_PASSED_R1.fastq: not in gzip format

gzip: /mnt/z/ayu/reads/S20-2-QUALITY_PASSED_R2.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.003s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)

gzip: /mnt/z/ayu/reads/S23-QUALITY_PASSED_R2.fastq: not in gzip format

gzip: /mnt/z/ayu/reads/S23-QUALITY_PASSED_R1.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.012s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)

gzip: /mnt/z/ayu/reads/S27-QUALITY_PASSED_R1.fastq: not in gzip format

....

gzip: /mnt/z/ayu/reads/S27-QUALITY_PASSED_R2.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.024s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)

I think this is the error:

Chnge the kraken_run_extra: "--gzip-compressed " to kraken_run_extra: ""in the config.yaml and start anew.

Thanjks @SilasK that seems to have worked
2021-08-10T115315.338986.snakemake.log ๐Ÿ‘๐Ÿฝ

with >85% reads classified:

cat log/kraken/*/*.log
Loading database information... done.
13080534 sequences (2616.11 Mbp) processed in 238.715s (3287.7 Kseq/m, 657.55 Mbp/m).
  11311337 sequences classified (86.47%)
  1769197 sequences unclassified (13.53%)
Loading database information... done.
2060576 sequences (412.12 Mbp) processed in 37.320s (3312.8 Kseq/m, 662.56 Mbp/m).
  1803574 sequences classified (87.53%)
  257002 sequences unclassified (12.47%)
...
Loading database information... done.
4282843 sequences (856.57 Mbp) processed in 79.849s (3218.2 Kseq/m, 643.64 Mbp/m).
  3834677 sequences classified (89.54%)
  448166 sequences unclassified (10.46%)

(kraken) animeshs@DMED7596:/mnt/z/Kraken$ ls -ltrh kraken_results/db_name/reports/*txt
-rwxrwxrwx 1 animeshs animeshs 307K Aug 10 12:01 kraken_results/db_name/reports/S27_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 295K Aug 10 12:01 kraken_results/db_name/reports/S27_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 254K Aug 10 12:03 kraken_results/db_name/reports/S5_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 244K Aug 10 12:03 kraken_results/db_name/reports/S5_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 304K Aug 10 12:06 kraken_results/db_name/reports/S31_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 293K Aug 10 12:06 kraken_results/db_name/reports/S31_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:13 kraken_results/db_name/reports/S33_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 301K Aug 10 12:13 kraken_results/db_name/reports/S33_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 294K Aug 10 12:16 kraken_results/db_name/reports/S3_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 283K Aug 10 12:16 kraken_results/db_name/reports/S3_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 311K Aug 10 12:22 kraken_results/db_name/reports/S47_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 299K Aug 10 12:22 kraken_results/db_name/reports/S47_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 312K Aug 10 12:27 kraken_results/db_name/reports/S37_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 300K Aug 10 12:27 kraken_results/db_name/reports/S37_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:33 kraken_results/db_name/reports/S23_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 301K Aug 10 12:33 kraken_results/db_name/reports/S23_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:39 kraken_results/db_name/reports/S40_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 300K Aug 10 12:39 kraken_results/db_name/reports/S40_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 308K Aug 10 12:44 kraken_results/db_name/reports/S48_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 296K Aug 10 12:44 kraken_results/db_name/reports/S48_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 314K Aug 10 12:51 kraken_results/db_name/reports/S44_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 302K Aug 10 12:52 kraken_results/db_name/reports/S44_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 277K Aug 10 12:54 kraken_results/db_name/reports/S46_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 266K Aug 10 12:54 kraken_results/db_name/reports/S46_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 289K Aug 10 12:57 kraken_results/db_name/reports/S36_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 278K Aug 10 12:57 kraken_results/db_name/reports/S36_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 296K Aug 10 13:02 kraken_results/db_name/reports/S30_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 284K Aug 10 13:02 kraken_results/db_name/reports/S30_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 314K Aug 10 13:09 kraken_results/db_name/reports/S35_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 302K Aug 10 13:09 kraken_results/db_name/reports/S35_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 281K Aug 10 13:11 kraken_results/db_name/reports/S2_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 270K Aug 10 13:11 kraken_results/db_name/reports/S2_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 310K Aug 10 13:16 kraken_results/db_name/reports/S28_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 298K Aug 10 13:16 kraken_results/db_name/reports/S28_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 277K Aug 10 13:17 kraken_results/db_name/reports/S20_2_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 266K Aug 10 13:17 kraken_results/db_name/reports/S20_2_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 286K Aug 10 13:20 kraken_results/db_name/reports/s13_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 275K Aug 10 13:20 kraken_results/db_name/reports/s13_QUALITY_PASSED_bracken_species.txt

is this fine-ish number to expect from a typical experiment? Also any downstream tools for this workflow you suggest, would be nice to know ๐Ÿ’ฏ

@animesh For ways to analyze the counts output of Kraken I suggest aldex2 in R or this jupyter notebook.
https://github.com/SilasK/CMGM/blob/main/notebooks/Analyze-cold-adapted-microbiota.ipynb

the counts there come from Kreken. You only need to replace the mouse database with the human.

However, if you want to use the functional inference it is better to have relative abundance which you get ideally from coverage and not coutns.

Thanks @SilasK :) I tried to run in on google-colab and i think it went fine with git pull of your repo and some installs (just made a pull request if you want to check?).

In general, what do you think about the 85-90% mapping, is this fine?

Thank you. I check the PR. 80-90% mapping rate is good.