nf-core/pangenome

Won't accept sample sheet.csv

SarahBeecroft opened this issue · 4 comments

Check Documentation

I have checked the following places for your error:

Description of the bug

No matter what format I put inside my samplesheet.csv, I get the following error

ERROR: Validation of pipeline parameters failed!

* --input: string [samplesheet.csv] does not match pattern ^\S+\.fn?a(sta)?(\.gz)?$ (samplesheet.csv)

I have copied in the exact format from the usage.md, as well as from the python script that parses the spreadsheet. It still fails if I create dummy files which match the names in the sample sheet.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input samplesheet.csv --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length '700,900,1100' --outdir outdir
  1. See error:
ERROR: Validation of pipeline parameters failed!

* --input: string [samplesheet.csv] does not match pattern ^\S+\.fn?a(sta)?(\.gz)?$ (samplesheet.csv)

Expected behaviour

I would expect it to read my sample sheet, since it should follow the expected format.

Log files

Mar-24 12:20:11.820 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input samplesheet.csv --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length 700,900,1100 --outdir outdir
Mar-24 12:20:11.989 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 22.10.6
Mar-24 12:20:12.020 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/scratch/pawsey0012/sbeecroft/pangenome/plugins; core-plugins: nf-amazon@1.11.3,nf-azure@0.14.2,nf-codecommit@0.1.2,nf-console@1.0.4,nf-ga4gh@1.0.4,nf-google@1.4.5,nf-tower@1.5.6,nf-wave@0.5.3
Mar-24 12:20:12.036 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Mar-24 12:20:12.037 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Mar-24 12:20:12.045 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Mar-24 12:20:12.070 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Mar-24 12:20:12.091 [main] DEBUG nextflow.scm.ProviderConfig - Using SCM config path: /scratch/pawsey0012/sbeecroft/pangenome/scm
Mar-24 12:20:13.450 [main] DEBUG nextflow.scm.AssetManager - Git config: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/pangenome.git
Mar-24 12:20:13.469 [main] DEBUG nextflow.scm.RepositoryFactory - Found Git repository result: [RepositoryFactory]
Mar-24 12:20:13.478 [main] DEBUG nextflow.scm.AssetManager - Git config: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/pangenome.git
Mar-24 12:20:15.132 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config
Mar-24 12:20:15.132 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config
Mar-24 12:20:15.133 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config
Mar-24 12:20:15.134 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config
Mar-24 12:20:15.153 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `singularity`
Mar-24 12:20:16.437 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `singularity`
Mar-24 12:20:16.849 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [cfc_dev, ifb_core, denbi_qbic, genotoul, alice, mjolnir_globe, uppmax, abims, janelia, nihbiowulf, nu_genomics, oist, sahmri, mpcdf, leicester, lugh, vsc_ugent, sage, cambridge, unibe_ibu, vai, podman, czbiohub_aws, jax, cheaha, xanadu, ccga_med, test, scw, tigem, google, computerome, ipop_up, seg_globe, sanger, dkfz, pasteur, test_full, eddie, medair, azurebatch, bi, hki, bigpurple, sbc_sharc, adcra, crukmi, cedars, docker, engaging, gis, psmn, eva, ucl_myriad, utd_ganymede, charliecloud, fgcz, conda, crg, singularity, icr_davros, ceres, munin, arm, rosalind, prince, hasta, cfc, utd_sysbio, uzh, debug, genouest, cbe, ebc, ku_sund_dangpu, ccga_dx, crick, marvin, phoenix, gitpod, biohpc_gen, seawulf, shifter, mana, mamba, wehi, awsbatch, uct_hpc, imperial, maestro, aws_tower, binac]
Mar-24 12:20:16.896 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
Mar-24 12:20:16.897 [main] INFO  nextflow.cli.CmdRun - Launching `https://github.com/nf-core/pangenome` [cheesy_stonebraker] DSL2 - revision: 2a17b1c088 [a_brave_new_world]
Mar-24 12:20:16.897 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Mar-24 12:20:16.897 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
Mar-24 12:20:16.902 [main] DEBUG nextflow.secret.LocalSecretsProvider - Secrets store: /scratch/pawsey0012/sbeecroft/pangenome/secrets/store.json
Mar-24 12:20:16.906 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@59fea5f5] - activable => nextflow.secret.LocalSecretsProvider@59fea5f5
Mar-24 12:20:16.975 [main] DEBUG nextflow.Session - Session UUID: ec5c62f7-f7e1-4fec-849b-a400dd875127
Mar-24 12:20:16.975 [main] DEBUG nextflow.Session - Run name: cheesy_stonebraker
Mar-24 12:20:16.983 [main] DEBUG nextflow.Session - Executor pool size: 256
Mar-24 12:20:16.995 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=768; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Mar-24 12:20:17.045 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 22.10.6 build 5843
  Created: 23-01-2023 23:20 UTC (24-01-2023 07:20 AWDT)
  System: Linux 5.3.18-150300.59.87_11.0.78-cray_shasta_c
  Runtime: Groovy 3.0.13 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21
  Encoding: UTF-8 (UTF-8)
  Process: 47930@setonix-01 [146.118.12.22]
  CPUs: 256 - Mem: 251.2 GB (192.2 GB) - Swap: 9.3 GB (8.5 GB)
Mar-24 12:20:17.084 [main] DEBUG nextflow.Session - Work-dir: /scratch/pawsey0012/sbeecroft/pangenome/work [lustre]
Mar-24 12:20:17.084 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/bin
Mar-24 12:20:17.102 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Mar-24 12:20:17.117 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Mar-24 12:20:17.248 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Mar-24 12:20:17.270 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 257; maxThreads: 1000
Mar-24 12:20:17.445 [main] DEBUG nextflow.Session - Session start
Mar-24 12:20:17.452 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow started -- trace file: /scratch/pawsey0012/sbeecroft/pangenome/outdir/pipeline_info/execution_trace_2023-03-24_12-20-16.txt
Mar-24 12:20:17.480 [main] DEBUG nextflow.Session - Using default localLib path: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib
Mar-24 12:20:17.487 [main] DEBUG nextflow.Session - Adding to the classpath library: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib
Mar-24 12:20:17.488 [main] DEBUG nextflow.Session - Adding to the classpath library: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib/nfcore_external_java_deps.jar
Mar-24 12:20:18.709 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-24 12:20:18.785 [main] INFO  nextflow.Nextflow - 

-�[2m----------------------------------------------------�[0m-
                                        �[0;32m,--.�[0;30m/�[0;32m,-.�[0m
�[0;34m        ___     __   __   __   ___     �[0;32m/,-._.--~'�[0m
�[0;34m  |\ | |__  __ /  ` /  \ |__) |__         �[0;33m}  {�[0m
�[0;34m  | \| |       \__, \__/ |  \ |___     �[0;32m\`-._,-`-,�[0m
                                        �[0;32m`._,._,'�[0m
�[0;35m  nf-core/pangenome v1.0dev-g2a17b1c�[0m
-�[2m----------------------------------------------------�[0m-
�[1mCore Nextflow options�[0m
  �[0;34mrevision             : �[0;32ma_brave_new_world�[0m
  �[0;34mrunName              : �[0;32mcheesy_stonebraker�[0m
  �[0;34mcontainerEngine      : �[0;32msingularity�[0m
  �[0;34mlaunchDir            : �[0;32m/scratch/pawsey0012/sbeecroft/pangenome�[0m
  �[0;34mworkDir              : �[0;32m/scratch/pawsey0012/sbeecroft/pangenome/work�[0m
  �[0;34mprojectDir           : �[0;32m/scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome�[0m
  �[0;34muserName             : �[0;32msbeecroft�[0m
  �[0;34mprofile              : �[0;32msingularity�[0m
  �[0;34mconfigFiles          : �[0;32m/scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config, /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config�[0m

�[1mInput/output options�[0m
  �[0;34minput                : �[0;32msamplesheet.csv�[0m
  �[0;34mn_haplotypes         : �[0;32m12�[0m
  �[0;34moutdir               : �[0;32moutdir�[0m

�[1mWfmash Options�[0m
  �[0;34mwfmash_map_pct_id    : �[0;32m70�[0m
  �[0;34mwfmash_segment_length: �[0;32m2000�[0m
  �[0;34mwfmash_block_length  : �[0;32mnull�[0m
  �[0;34mwfmash_sparse_map    : �[0;32mnull�[0m
  �[0;34mwfmash_exclude_delim : �[0;32mnull�[0m
  �[0;34mwfmash_temp_dir      : �[0;32mnull�[0m

�[1mSeqwish Options�[0m
  �[0;34mseqwish_temp_dir     : �[0;32mnull�[0m

�[1mSmoothxg options�[0m
  �[0;34msmoothxg_block_id_min: �[0;32mnull�[0m
  �[0;34msmoothxg_temp_dir    : �[0;32mnull�[0m

!! Only displaying parameters that differ from the pipeline defaults !!
-�[2m----------------------------------------------------�[0m-
If you use nf-core/pangenome for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/pangenome/blob/master/CITATIONS.md
-�[2m----------------------------------------------------�[0m-
Mar-24 12:20:19.040 [main] ERROR nextflow.Nextflow - ERROR: Validation of pipeline parameters failed!
Mar-24 12:20:19.052 [main] ERROR nextflow.Nextflow - * --input: string [samplesheet.csv] does not match pattern ^\S+\.fn?a(sta)?(\.gz)?$ (samplesheet.csv)

System

  • Hardware: HPC Cray Shasta
  • Executor: slurm

Nextflow Installation

  • Version: 22.10.6 (installed with Conda)

Container engine

  • Engine: Singularity version 3.8.6

Thanks!!

Hi, Sorry for coming back so late.
I must have missed this in my mails.

The pipeline does not accept a sample sheet, because its input is only a plain FASTA file.

Please try

nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input FASTA_INPUT.fa.gz --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length '700,900,1100' --outdir outdir

The FASTA file has to be zipped with bgzip beforehand.

No worries! That's working for me now, thanks. I think I got confused because there's mention of the sample sheet in the readme currently. Nice when it's a simple fix! Thanks :)

I have five genomes. Do I merge them and compress them bgzip?

You put all sequences of all the five genomes into one FASTA file. Ideally, you rename the names of the sequences to respect the https://github.com/pangenome/PanSN-spec.