Our main concern so far has been getting CWLs converted to WDL. Following this will be efforts on optimization of the workflows, and cleanup of the repository.
In the future we may rework the structure of this repository to a format that Dockstore supports and leverage that tool.
These errors indicate that the disk storage space has been filled. That part is pretty straightforward. The part that's a bit more of a pitfall is that depending on what output this happened on, you need to change different disk sizes.
If the failed write happened at /cromwell_root
path, then disks: "local-disk ..."
needs to be increased. However, if the failed write
happens during to stdin
or stdout
, or any of the other standard
Linux-y places, then you'll need to increase the value of
bootDiskSizeGb
. Cromwell in GCP mounts two disks, at minimum: the
boot disk, and a local-disk. Boot disk handles all the operating
system files, but local-disk is where almost all of your "work" is
going to happen, besides piping between commands.
This applies more to newly converted files than hardened ones but many runs failed because a file wasn't included in the instance. Generally, this happens because the CWL did not specify a secondaryFile that it assumed would exist next to the passed in file. This works on the cluster, because the tools just look for the file and it already sits where it's expected. This does not work on the cloud because that file is never sent to the instance. The solution is to add this parameter explicitly to the WDL and pass it through, top down.
This is one of two things. Either (A) the input is malformed or otherwise incorrect, or (B) the specified file was not uploaded to the bucket. These are both instances of the general version of the error, "No file has been uploaded to the specified URL".
Last confirmed mirror with the analysis-workflows CWL repo was commit 788bdc99c1d5b6ee7c431c3c011eb30d385c1370, PR#1063, Apr6 2022. Commits from that point on may deviate unless compared. Update these values if that is done.
There is not yet a supported Directory type in WDL. Instances of this
like Directory vep_cache_dir
which involve nested directory structure are
replaced with File vep_cache_dir_zip
. Instances of this like
Directory hla_call_files
which are just a flat collection of files are
replaced with Array[File] hla_call_files
.
Input files must prefix each argument with the name of the workflow
they're going to run, because a WDL file can contain multiple
workflows or pass inputs over a layer if they aren't propagated
through in the definition. e.g. to call workflow somaticExome
with
input foo
, yaml key must be somaticExome.foo
If WDLs are being used leveraging the
cloud-workflows/scripts/cloudize-workflow.py
helper
script,
the generated input file will have this handled already.
- alignment_exome
- alignment_exome_nonhuman
- alignment_umi_duplex # this depends on a thing with non-trivial embedded javascript
- alignment_umi_molecular # this depends on a thing with non-trivial embedded javascript
- alignment_wgs
- alignment_wgs_nonhuman
- aml_trio_cle
- aml_trio_cle_gathered # This doesn't make sense in cloud
- bisulfite
- chipseq # This depends on homer-tag-directory, doesn't make sense in cloud
- chipseq_alignment_nonhuman # This depends on homer-tag-directory, doesn't make sense in cloud
- detect_variants
- detect_variants_nonhuman
- detect_variants_wgs
- downsample_and_recall
- gathered_downsample_and_recall # This doesn't make sense in cloud
- germline_exome
- germline_exome_gvcf
- germline_exome_hla_typing
- germline_wgs
- germline_wgs_gvcf
- immuno
- rnaseq
- rnaseq_star_fusion
- rnaseq_star_fusion_with_xenosplit
- somatic_exome
- somatic_exome_cle
- somatic_exome_cle_gathered # This doesn't make sense in cloud
- somatic_exome_gathered # This doesn't make sense in cloud
- somatic_exome_nonhuman
- somatic_wgs
- tumor_only_detect_variants
- tumor_only_exome
- tumor_only_wgs
- align
- align_sort_markdup
- bam_readcount
- bam_to_trimmed_fastq_and_hisat_alignments
- bgzip_and_index
- bisulfite_qc
- cellranger_mkfastq_and_count
- cnvkit_single_sample
- cram_to_bam_and_index
- cram_to_cnvkit
- docm_cle
- docm_germline
- duplex_alignment
- filter_vcf
- filter_vcf_nonhuman
- fp_filter
- gatk_haplotypecaller_iterator
- germline_detect_variants
- germline_filter_vcf
- hs_metrics
- joint_genotype
- merge_svs
- molecular_alignment
- molecular_qc
- mutect
- phase_vcf
- pindel
- pindel_cat
- pindel_region
- pvacseq
- qc_exome
- qc_exome_no_verify_bam
- qc_wgs
- qc_wgs_nonhuman
- sequence_align_and_tag_adapter
- sequence_to_bqsr
- sequence_to_bqsr_nonhuman
- sequence_to_trimmed_fastq
- sequence_to_trimmed_fastq_and_biscuit_alignments
- single_cell_rnaseq
- single_sample_sv_callers
- strelka_and_post_processing
- strelka_process_vcf
- sv_depth_caller_filter
- sv_paired_read_caller_filter
- umi_alignment
- varscan
- varscan_germline
- varscan_pre_and_post_processing
- vcf_eval_cle_gold
- vcf_eval_concordance
- vcf_readcount_annotator
- add_strelka_gt
- add_string_at_line
- add_string_at_line_bgzipped
- add_vep_fields_to_table
- agfusion
- align_and_tag
- annotsv
- annotsv_filter
- apply_bqsr
- bam_readcount
- bam_to_bigwig
- bam_to_cram
- bam_to_fastq
- bam_to_sam
- bcftools_merge
- bedgraph_to_bigwig
- bedtools_intersect
- bgzip
- biscuit_align
- biscuit_markdup
- biscuit_pileup
- bisulfite_qc_conversion
- bisulfite_qc_coverage_stats
- bisulfite_qc_cpg_retention_distribution
- bisulfite_qc_mapping_summary
- bisulfite_vcf2bed
- bqsr
- call_duplex_consensus
- call_molecular_consensus
- cat_all
- cat_out
- cellmatch_lineage
- cellranger_atac_count
- cellranger_count
- cellranger_feature_barcoding
- cellranger_mkfastq
- cellranger_vdj
- cle_aml_trio_report_alignment_stat
- cle_aml_trio_report_coverage_stat
- cle_aml_trio_report_full_variants
- clip_overlap
- cnvkit_batch
- cnvkit_vcf_export
- cnvnator
- collect_alignment_summary_metrics
- collect_gc_bias_metrics
- collect_hs_metrics
- collect_insert_size_metrics
- collect_wgs_metrics
- combine_gvcfs
- combine_variants
- combine_variants_concordance
- combine_variants_wgs
- concordance
- cram_to_bam
- docm_add_variants
- docm_gatk_haplotype_caller
- downsample
- duphold
- duplex_seq_metrics
- eval_cle_gold
- eval_vaf_report
- extract_hla_alleles
- extract_umis
- fastq_to_bam
- filter_consensus
- filter_known_variants
- filter_sv_vcf_blocklist_bedpe
- filter_sv_vcf_depth
- filter_sv_vcf_read_support
- filter_sv_vcf_size
- filter_vcf_cle
- filter_vcf_coding_variant
- filter_vcf_custom_allele_freq
- filter_vcf_depth
- filter_vcf_docm
- filter_vcf_mapq0
- filter_vcf_somatic_llr
- fix_vcf_header
- fp_filter
- gather_to_sub_directory
- gatherer
- gatk_genotypegvcfs
- gatk_haplotype_caller
- generate_qc_metrics
- germline_combine_variants
- grolar
- group_reads
- hisat2_align
- hla_consensus
- homer_tag_directory # This doesn't make sense in cloud
- index_bam
- index_cram
- index_vcf
- intersect_known_variants
- interval_list_expand
- intervals_to_bed
- kallisto
- kmer_size_from_index
- manta_somatic
- mark_duplicates_and_sort
- mark_illumina_adapters
- merge_bams
- merge_bams_samtools
- merge_vcf
- mutect
- name_sort
- normalize_variants
- optitype_dna
- picard_merge_vcfs
- pindel
- pindel2vcf
- pindel_somatic_filter
- pizzly
- pvacbind
- pvacfuse
- pvacseq
- pvacseq_combine_variants
- pvacvector
- read_backed_phasing
- realign
- remove_end_tags
- rename
- replace_vcf_sample_name
- samtools_flagstat
- samtools_sort
- select_variants
- sequence_align_and_tag
- sequence_to_bam # this uses non-trivial embedded javascript
- sequence_to_fastq
- set_filter_status
- single_sample_docm_filter
- smoove
- somatic_concordance_graph
- sompy
- sort_vcf
- split_interval_list
- split_interval_list_to_bed
- staged_rename
- star_align_fusion
- star_fusion_detect
- strandedness_check
- strelka
- stringtie
- survivor
- transcript_to_gene
- trim_fastq
- umi_align
- variants_to_table
- varscan_germline
- varscan_process_somatic
- varscan_somatic
- vcf_expression_annotator
- vcf_readcount_annotator
- vcf_sanitize
- vep
- verify_bam_id
- vt_decompose
- xenosplit