plotting fails with dplyr < 1.0.0

Question

plotting fails with dplyr < 1.0.0

Closed this issue 2 years ago · 0 comments

The pipeline.plotting.R script can fail to join LOCO and INBRED mappings if INBRED mapping has strain as class logical.
This happens when INBRED mapping does not identify QTL.

We should consider adding dplyr 1.0.0 to the requirements to avoid this issue. Earlier versions of dplyr throw and error when the data classes do not match.

alternatively the column classes can be specified specifically with the following code:

INBRED <- data.table::fread(args[2], colClasses = c(aboveBF = "integer", strain = "character", value = "double", allele = "integer", var.exp = "double", startPOS = "integer", peakPOS = "integer", endPOS = "integer", peak_id = "integer", 	interval_size = "integer")) %>%
  dplyr::mutate(algorithm = "INBRED")

Here's the error I received before upgrading:

[qnode5147::/projects/b1059/projects/Tim/NemaScan] 🏀  nextflow main.nf -profile mappings --vcf input_data/elegans/genotypes/WI.20201117.hard-filter.ref_strain.vcf.gz --traitfile input_data/elegans/phenotypes/20201205_HI_median_continuou_env_traits.tsv --sthresh EIGEN -resume
N E X T F L O W  ~  version 19.10.0
Launching `main.nf` [wise_bell] - revision: 00e7c07bfb
O~~~     O~~                                   O~~ ~~
O~ O~~   O~~                                 O~~    O~~
O~~ O~~  O~~   O~~    O~~~ O~~ O~~    O~~     O~~         O~~~   O~~    O~~ O~~
O~~  O~~ O~~ O~   O~~  O~~  O~  O~~ O~~  O~~    O~~     O~~    O~~  O~~  O~~  O~~
O~~   O~ O~~O~~~~~ O~~ O~~  O~  O~~O~~   O~~       O~~ O~~    O~~   O~~  O~~  O~~
O~~    O~ ~~O~         O~~  O~  O~~O~~   O~~ O~~    O~~ O~~   O~~   O~~  O~~  O~~
O~~      O~~  O~~~~   O~~~  O~  O~~  O~~ O~~~  O~~ ~~     O~~~  O~~ O~~~O~~~  O~~
Trait File                              = RUN
VCF                                     = null
Significance Threshold                  = EIGEN
Result Directory                        = Analysis_Results-20201205
Eigen Memory allocation                 = 100 GB
WARN: Access to undefined parameter `annotate` -- Initialise it to a default value eg. `params.annotate = some_value`
executor >  slurm (12)
[5e/e38d0c] process > fix_strain_names_bulk (1)  [100%] 1 of 1, cached: 1 ✔
[22/b9ffb8] process > vcf_to_geno_matrix (1)     [100%] 1 of 1, cached: 1 ✔
[1c/c2a3c3] process > chrom_eigen_variants (X)   [100%] 6 of 6, cached: 6 ✔
[34/903e13] process > collect_eigen_variants     [100%] 1 of 1, cached: 1 ✔
executor >  slurm (12)
[5e/e38d0c] process > fix_strain_names_bulk (1)  [100%] 1 of 1, cached: 1 ✔
[22/b9ffb8] process > vcf_to_geno_matrix (1)     [100%] 1 of 1, cached: 1 ✔
[1c/c2a3c3] process > chrom_eigen_variants (X)   [100%] 6 of 6, cached: 6 ✔
[34/903e13] process > collect_eigen_variants     [100%] 1 of 1, cached: 1 ✔
[18/12b3d3] process > prepare_gcta_files (6)     [100%] 6 of 6, cached: 6 ✔
executor >  slurm (12)
[5e/e38d0c] process > fix_strain_names_bulk (1)  [100%] 1 of 1, cached: 1 ✔
[22/b9ffb8] process > vcf_to_geno_matrix (1)     [100%] 1 of 1, cached: 1 ✔
[1c/c2a3c3] process > chrom_eigen_variants (X)   [100%] 6 of 6, cached: 6 ✔
[34/903e13] process > collect_eigen_variants     [100%] 1 of 1, cached: 1 ✔
[18/12b3d3] process > prepare_gcta_files (6)     [100%] 6 of 6, cached: 6 ✔
[fa/48fa19] process > gcta_grm (6)               [100%] 6 of 6, cached: 6 ✔
[20/d5cadd] process > gcta_lmm_exact_mapping (6) [100%] 6 of 6, cached: 6 ✔
[88/67ee8c] process > gcta_intervals_maps (2)    [100%] 6 of 6 ✔
[58/b7126d] process > generate_plots (6)         [100%] 6 of 6, failed: 6
WARN: Killing pending tasks (5)
Error executing process > 'generate_plots (1)'
Caused by:
  Process `generate_plots (1)` terminated with an error exit status (1)
Command executed:
  Rscript --vanilla `which pipeline.plotting.R` processed_ambient_humidity_LMM_EXACT_LOCO_mapping.tsv processed_ambient_humidity_LMM_EXACT_INBRED_mapping.tsv `which sweep_summary.tsv`
Command exit status:
  1
Command output:
  (empty)
Command error:
  ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
  ✔ ggplot2 3.2.1     ✔ purrr   0.3.4
  ✔ tibble  3.0.3     ✔ dplyr   0.8.3
  ✔ tidyr   1.1.2     ✔ stringr 1.4.0
  ✔ readr   1.3.1     ✔ forcats 0.5.0
  ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
  ✖ dplyr::filter() masks stats::filter()
  ✖ dplyr::lag()    masks stats::lag()
  Attaching package: 'data.table'
  The following objects are masked from 'package:dplyr':
      between, first, last
  The following object is masked from 'package:purrr':
      transpose
  Joining, by = c("CHROM", "marker", "POS", "A1", "A2", "AF1", "BETA", "SE", "P", "log10p", "trait", "BF", "aboveBF", "strain", "value", "allele", "var.exp", "startPOS", "peakPOS", "endPOS", "peak_id", "interval_size", "algorithm")
  Error: Can't join on 'strain' x 'strain' because of incompatible types (logical / character)
  Execution halted
Work dir:
  /projects/b1042/AndersenLab/work/tim/f0/8961c2d7a7a3df3f8b903ac6630289
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
[qnode5147::/projects/b1059/projects/Tim/NemaScan] 🏀