fabou-uobaf/VaQuERo

no output with vcf2tsv_long.py

Opened this issue · 5 comments

Hello,

I am trying to run vcf2tsv_long.py but I am not getting any output with my vcf file. I've attached my vcf file. The script only outputs the header. I've tried the script with different thresholds of min alleles but there is no output.

Thank you,
Lenore

sample40_variants.txt

Also I would like to know how the snpEff sarsCoV2 database was built. Thank you.

Dear Lenore,

Sorry, that part is not well documented.
In our calling pipeline we add a format and sample column to the vcf created by lofreq, as this allows us to merge multiple vcfs.
I adapted the script, so that you can directly use it with the output from lofreq now. As there is not sample name in the vcf, it will just take the filename without the .vcf/.vcf.gz extension as a sample id - I hope that works for you.
As for snpEff we just created a database from the official genbank file early on in the pandemic, but you can just use one of the prebuilt ones - the one for the genome version you use is just called MN908947.3, so you should be able to download it with java -jar snpEff.jar download MN908947.3
Vaquero does not use the annotations as far as I know, though.
All the best
Lukas

Ok thank you for the updates and information. I just tried running the VaQuero.r script but I did not get any output. All of the output files were blank. This was the command I used

Rscript VaQuERo.r --data=sample40_vaquero.txt --metadata=metaDataSub.txt --marker=mutations_list.csv --smarker=mutations_special.csv --pmarker=mutations_problematic_all.csv

The output I got was

##------ Thu Jan 26 13:15:07 2023 ------##
[1] "##~LOG~PARAMETERS~####################"
$dir
[1] "ExampleOutput"

$country
[1] "Austria"

$bbsouth
[1] 46.38

$bbnorth
[1] 49.01

$bbwest
[1] 9.53

$bbeast
[1] 17.15

$metadata
[1] "data/metaDataSub.tsv"

$marker
[1] "resources/mutations_list.csv"

$smarker
[1] "resources/mutations_special.csv"

$pmarker
[1] "resources/mutations_problematic_all.csv"

$data2
[1] "data/mutationDataSub_sparse.tsv"

$data
[1] "/space/s1/lenore/virus_database/run_usher/sample40_vaquero.tsv"

$inputformat
[1] "tidy"

$plotwidth
[1] 8

$plotheight
[1] 4.5

$ninconsens
[1] 0.4

$zero
[1] 0.02

$depth
[1] 75

$recent
[1] 99

$plottp
[1] 3

$minuniqmark
[1] 1

$minuniqmarkfrac
[1] 0.4

$minqmark
[1] 3

$minmarkfrac
[1] 0.4

$smoothingsamples
[1] 2

$smoothingtime
[1] 2

$voi
[1] "BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"

$highlight
[1] "BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"

$debug
[1] FALSE

$help
[1] FALSE

[1] "##~LOG~PARAMETERS~####################"




[1] "PROGRESS: create directory "
##------ Thu Jan 26 13:15:07 2023 ------##
[1] "PROGRESS: read mutation files "
[1] "PROGRESS: read and process meta data "
[1] "LOG: last_BSF_run_id BSF_0886"
[1] "LOG: print STATUS counts: "
< table of extent 0 >
[1] "PROGRESS: print STATUS map"
##------ Thu Jan 26 13:15:07 2023 ------##
Saving 7 x 7 in image
[1] "PROGRESS: read AF data "
##------ Thu Jan 26 13:15:09 2023 ------##
[1] "LOG: current run ID: BSF_0886"
[1] "LOG: earliest sample in current run: 2020-12-16"
[1] "LOG: latest sample in current run: 2020-12-16"
[1] "PROGRESS: start to loop over WWTP"
##------ Thu Jan 26 13:15:09 2023 ------##
[1] "PROGRESS: NA NA"
[1] "     PROGRESS: start plotting  NA"
[1] "PROGRESS: loop over WWTP finished"
[1] "PROGRESS: writing result tables"
[1] "PROGRESS: start plotting overviews"
[1] "PROGRESS: start plotting maps"
[1] "     PROGRESS: considering BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"
[1] "     PROGRESS: considering BA.4"
[1] "     PROGRESS: considering BA.5"
[1] "     PROGRESS: considering BA.1"
[1] "     PROGRESS: considering BA.2"
[1] "     PROGRESS: considering B.1.617.2"
[1] "     PROGRESS: considering B.1.1.7"
##------ Thu Jan 26 13:15:10 2023 ------##

mutations_problematic_all.csv
mutations_special.csv
mutations_list.csv
metaDataSub.txt
sample40_vaquero.txt

I tried running the same command on the version 2 and got this error

Rscript /space/s1/lenore/software/VaQuERo/scripts/VaQuERo_v2.r --data=/space/s1/lenore/virus_database/run_usher/sample40_vaquero.tsv --metadata=data/metaDataSub.tsv --marker=resources/mutations_list.csv --smarker=resources/mutations_special.csv --pmarker=resources/mutations_problematic_all.csv 
ℹ Google's Terms of Service: <https://mapsplatform.google.com>
ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
##------ Thu Jan 26 13:40:46 2023 ------##
[1] "##~LOG~PARAMETERS~####################"
$dir
[1] "ExampleOutput"

$country
[1] "Austria"

$bbsouth
[1] 46.38

$bbnorth
[1] 49.01

$bbwest
[1] 9.53

$bbeast
[1] 17.15

$metadata
[1] "data/metaDataSub.tsv"

$marker
[1] "resources/mutations_list.csv"

$smarker
[1] "resources/mutations_special.csv"

$pmarker
[1] "resources/mutations_problematic_all.csv"

$data2
[1] "data/mutationDataSub_sparse.tsv"

$data
[1] "/space/s1/lenore/virus_database/run_usher/sample40_vaquero.tsv"

$inputformat
[1] "tidy"

$plotwidth
[1] 8

$plotheight
[1] 4.5

$ninconsens
[1] 0.4

$zero
[1] 0.02

$depth
[1] 75

$recent
[1] 99

$plottp
[1] 3

$minuniqmark
[1] 1

$minuniqmarkfrac
[1] 0.4

$minqmark
[1] 3

$minmarkfrac
[1] 0.4

$smoothingsamples
[1] 2

$smoothingtime
[1] 2

$voi
[1] "BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"

$highlight
[1] "BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"

$colorBase
[1] "B.1.617.2,BA.1,BA.2,BA.4,BA.5"

$debug
[1] FALSE

$help
[1] FALSE

[1] "##~LOG~PARAMETERS~####################"




[1] "PROGRESS: create directory "
##------ Thu Jan 26 13:40:46 2023 ------##
[1] "PROGRESS: read mutation files "
[1] "PROGRESS: read and process meta data "
[1] "LOG: last_BSF_run_id BSF_0886"
[1] "LOG: print STATUS counts: "
< table of extent 0 >
[1] "PROGRESS: print STATUS map"
##------ Thu Jan 26 13:40:46 2023 ------##
Saving 7 x 7 in image
[1] "PROGRESS: read AF data "
##------ Thu Jan 26 13:40:49 2023 ------##
[1] "LOG: current run ID: BSF_0886"
[1] "LOG: earliest sample in current run: 2020-12-16"
[1] "LOG: latest sample in current run: 2020-12-16"
[1] "PROGRESS: start to loop over WWTP"
##------ Thu Jan 26 13:40:49 2023 ------##
[1] "PROGRESS: NA NA"
[1] "     PROGRESS: start plotting  NA"
[1] "PROGRESS: loop over WWTP finished"
[1] "PROGRESS: writing result tables"
[1] "PROGRESS: start plotting overviews"
[1] "PROGRESS: start plotting maps"
[1] "     PROGRESS: considering BA.4,BA.5,BA.1,BA.2,B.1.617.2,B.1.1.7"
Error in `left_join()`:
! Can't join on `x$sample_date` x `y$sample_date` because of incompatible types.
ℹ `x$sample_date` is of type <character>>.
ℹ `y$sample_date` is of type <date>>.
Backtrace:
    ▆
 1. ├─dplyr::left_join(...)
 2. └─dplyr:::left_join.data.frame(...)
 3.   └─dplyr:::join_mutate(...)
 4.     └─dplyr:::join_rows(...)
 5.       └─base::tryCatch(...)
 6.         └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 7.           └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 8.             └─value[[3L]](cond)
 9.               └─rlang::abort(bullets, call = error_call)
Execution halted

Dear Lenore

There were two issues. For one, we use internally a two different identifier, partly a little inconsistantly. I will need to thouroughly go through the code and fix and test to resolve this properly. for the moment there is a workaround. please indicate in the metaData file in the column BSF_sample_name and RNA_ID_int the same sample name. Please find attached a fixed version. This should work.
The second issue was a problem with back compatibility. in the meantime you can also specify the state of each sample in the code. this is unfortunately not yet documented. if you specify it a few extra plots will be generating, aggregating the results over each state. I just pushed a fix so that you can also run the code without a state specification.
Please consider to pull the newest version and give it another try. For me
Rscript VaQuERo/scripts/VaQuERo_v2.r --data=sample40_vaquero.txt --metadata=metaDataSub_alt.txt --marker=mutations_list.csv --smarker=mutations_special.csv --pmarker=mutations_problematic_all.csv
works now.

bw, Fabian

metaDataSub_alt.txt