marbl/merqury

Error in read.table(dat, header = F) : no lines available in input

scthree opened this issue · 4 comments

Dear Developers,

Thank you for this wonderful tool and your contributions in leading the community!

Forgive me if I'm asking a stupid question. I recently used merqury to eval several assemblies generated with hifiasm (HiFi + ONT, then either phased with Trio-binning OR HiC). Was able to run merqury and generate all the asm stats/plots.

Then just made a Verkko assembly (phased with Trio-binning) and ran merqury on the diploid assembly (assembly.fasta) and hap1+hap2 (assembly.haplotype1.fasta, assembly.haplotype2.fasta). No issues with qv, completeness, and generating those plots/stats...but get an Rscript error when it's trying to generate the phased_block.stats - please see error log below.

Could you please let me know what I'm doing wrong? I know you're busy and appreciate your time!

Sincerely,
Steve


Haplotype dbs provided.
Running Merqury in trio mode...

hap1: mat_uncompress.k21.hapmer.meryl
hap2: pat_uncompress.k21.hapmer.meryl
asm1: assembly.fasta
out : dip

Get spectra-cn plots and QV stats
Get blob plots
Get haplotype specfic spectra-cn plots
Get phase blocks
Get block N plots

Generate assembly.fasta.fai

*** # Found assembly.gaps.bed ***

Found 246. Generating stats for both scaffolds and contigs.

Convert dip.assembly.100_20000.phased_block.bed to sizes

Result saved as dip.assembly.100_20000.phased_block.sizes

Plot dip.assembly.100_20000.phased_block.bed

Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b dip.assembly.100_20000.phased_block.sizes -c dip.assembly.contig.sizes -s dip.assembly.scaff.sizes -o dip.assembly
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

No block2 found. Done!

read: dtr_uncompress.k21.meryl

Haplotype dbs provided.
Running Merqury in trio mode...

hap1: mat_uncompress.k21.hapmer.meryl
hap2: pat_uncompress.k21.hapmer.meryl
asm1: assembly.haplotype1.fasta
asm2: assembly.haplotype2.fasta
out : hap

Get spectra-cn plots and QV stats
Get blob plots
Get haplotype specfic spectra-cn plots
Get phase blocks
Get block N plots

Generate assembly.haplotype1.fasta.fai

*** # Found assembly.haplotype1.gaps.bed ***

Found 85. Generating stats for both scaffolds and contigs.

Generate assembly.haplotype2.fasta.fai

*** # Found assembly.haplotype2.gaps.bed ***

Found 161. Generating stats for both scaffolds and contigs.

Convert hap.assembly.haplotype1.100_20000.phased_block.bed to sizes

Result saved as hap.assembly.haplotype1.100_20000.phased_block.sizes

Plot hap.assembly.haplotype1.100_20000.phased_block.bed

Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b hap.assembly.haplotype1.100_20000.phased_block.sizes -c hap.assembly.haplotype1.contig.sizes -s hap.assembly.haplotype1.scaff.sizes -o hap.assembly.haplotype1
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

Convert hap.assembly.haplotype2.100_20000.phased_block.bed to sizes

Result saved as hap.assembly.haplotype2.100_20000.phased_block.sizes

Plot hap.assembly.haplotype2.100_20000.phased_block.bed

Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b hap.assembly.haplotype2.100_20000.phased_block.sizes -c hap.assembly.haplotype2.contig.sizes -s hap.assembly.haplotype2.scaff.sizes -o hap.assembly.haplotype2
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

Hello,

There was a serious issue in meryl v1.4 release, making meryl-lookup crash. I wonder this was also affected by this issue.
If possible, could you obtain meryl v1.4.1 release, and re-run merqury?

If you'd like to re-run, you could you remove *.sort.bed and *.hap.wig, then re-run $MERQURY/trio/phase_block.sh. This has to be done for each assembly (so twice). Next, run $MERQURY/trio/block_n_stats.sh afterwards (also, twice). Since you are re-running it manually, you could also provide a genome size that could go with block_n_stats.sh to generate NG plots instead of N plots.

Best,
Arang

Hello, I think we have similar errors, but the difference is that I am running on version 1.4.1. I encountered similar errors during installation using both git clone and meryl-1.4.1.Linux-amd64.tar.xz. In the detailed documentation below, I executed the same commands by providing the reference sequence and parent-offspring databases, along with the two assembled haploid fa files.

Both the quality values (qv) and the generation of plots seem to be working fine, but I am unable to obtain the 100_20000.phased_block.stats and switch error. I cannot observe the completeness of the assembly. Based on the error messages, it seems to suggest that bedtools, samtools, and R are not present. However, I have confirmed that these tools exist in the current conda environment through the executed commands:

samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.18 (using htslib 1.17)
R --version
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)
bedtools
bedtools is a powerful toolset for genome arithmetic.
Version: v2.31.1

Haplotype dbs provided.
Running Merqury in trio mode...

hap1: chr21-db_M.meryl
hap2: chr21-db_F.meryl
asm1: hic_pred_hap1.fa
asm2: hic_pred_hap2.fa
out : hic

Get spectra-cn plots and QV stats

Get blob plots

Get haplotype specfic spectra-cn plots

Get phase blocks

Get block N plots
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'bedtools'
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'samtools'
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'

Generate hic_pred_hap1.fa.fasta.fai

[E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta
[faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai

*** # Found hic_pred_hap1.fa.gaps.bed ***

No gaps found. This is a contig set.

awk: fatal: cannot open file `hic_pred_hap1.fa.fasta.fai' for reading (No such file or directory)

Generate hic_pred_hap2.fa.fasta.fai

[E::fai_build3_core] Failed to open the file hic_pred_hap2.fa.fasta
[faidx] Could not build fai index hic_pred_hap2.fa.fasta.fai

*** # Found hic_pred_hap2.fa.gaps.bed ***

No gaps found. This is a contig set.

awk: fatal: cannot open file `hic_pred_hap2.fa.fasta.fai' for reading (No such file or directory)

Convert hic.hic_pred_hap1.fa.100_20000.phased_block.bed to sizes

Result saved as hic.hic_pred_hap1.fa.100_20000.phased_block.sizes

Plot hic.hic_pred_hap1.fa.100_20000.phased_block.bed

Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap1.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap1.fa.contig.sizes -o hic.hic_pred_hap1.fa
Loading required package: argparse
Warning message:
package ‘argparse’ was built under R version 4.1.3
Loading required package: ggplot2
Warning message:
package ‘ggplot2’ was built under R version 4.1.3
Loading required package: scales
Warning message:
package ‘scales’ was built under R version 4.1.3
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

Convert hic.hic_pred_hap2.fa.100_20000.phased_block.bed to sizes

Result saved as hic.hic_pred_hap2.fa.100_20000.phased_block.sizes

Plot hic.hic_pred_hap2.fa.100_20000.phased_block.bed

Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap2.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap2.fa.contig.sizes -o hic.hic_pred_hap2.fa
Loading required package: argparse
Warning message:
package ‘argparse’ was built under R version 4.1.3
Loading required package: ggplot2
Warning message:
package ‘ggplot2’ was built under R version 4.1.3
Loading required package: scales
Warning message:
package ‘scales’ was built under R version 4.1.3
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

output

100_20000.phased_block:
nothing
100_20000.switch.bed:
nothing

I sincerely hope that you are familiar with this issue and can assist me. Thank you very much for your help!

Hello @qq923781272 ,

dafc7e3 will fix your crash.
Also I see you don't need to load the modules. You could uncomment this line to avoid the ModuleCmd_Load ERRORs.

I'd suggest to clone the latest Merqury git from the master branch (nothing to install), and re run Merqury.
Hope this works for you!

Best,
Arang

I apologize for bothering you again. I have checked whether my Merqury version matches the branch correction you sent me. I am certain it is the latest version, but I am still encountering similar issues. Just to be safe, I cloned a fresh copy of the code, but the same error persists.

What's even more puzzling is that I successfully generated the phased_block and 100_20000.switches.txt in a previous run yesterday. However, today, I have been unable to reproduce the success under any circumstances.

I followed your instructions in the previous response to remove and execute $MERQURY/trio/phase_block.sh and $MERQURY/trio/block_n_stats.sh twice each. However, during this process, errors occurred, preventing me from obtaining results for switch and phased_block.

Do you have any insights into why this might be happening? The command I used is $MERQURY/merqury.sh read-db.meryl mat.meryl pat.meryl asm1.fasta out_prefix.

Here is the error message I encountered:
Haplotype dbs provided.
Running Merqury in trio mode...

hap1: chr21-db_M.meryl
hap2: chr21-db_F.meryl
asm1: hic_pred_hap1.fa
asm2: hic_pred_hap2.fa
out : hic

Get spectra-cn plots and QV stats

Get blob plots

Get haplotype specfic spectra-cn plots

Get phase blocks

Get block N plots
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'bedtools'
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'samtools'
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'

Generate hic_pred_hap1.fa.fasta.fai

[E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta
[faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai

*** # Found hic_pred_hap1.fa.gaps.bed ***

No gaps found. This is a contig set.

awk: fatal: cannot open file `hic_pred_hap1.fa.fasta.fai' for reading (No such file or directory)

Generate hic_pred_hap2.fa.fasta.fai

[E::fai_build3_core] Failed to open the file hic_pred_hap2.fa.fasta
[faidx] Could not build fai index hic_pred_hap2.fa.fasta.fai

*** # Found hic_pred_hap2.fa.gaps.bed ***

No gaps found. This is a contig set.

awk: fatal: cannot open file `hic_pred_hap2.fa.fasta.fai' for reading (No such file or directory)

Convert hic.hic_pred_hap1.fa.100_20000.phased_block.bed to sizes

Result saved as hic.hic_pred_hap1.fa.100_20000.phased_block.sizes

Plot hic.hic_pred_hap1.fa.100_20000.phased_block.bed

Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap1.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap1.fa.contig.sizes -o hic.hic_pred_hap1.fa
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

Convert hic.hic_pred_hap2.fa.100_20000.phased_block.bed to sizes

Result saved as hic.hic_pred_hap2.fa.100_20000.phased_block.sizes

Plot hic.hic_pred_hap2.fa.100_20000.phased_block.bed

Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap2.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap2.fa.contig.sizes -o hic.hic_pred_hap2.fa
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in read.table(dat, header = F) : no lines available in input
Calls: block_n -> attach_n -> read.table
Execution halted

$MERQURY/trio/phase_block.sh

Found 1 command tree.

PROCESSING TREE #1 using 128 threads.
opLessThan
chr21-db_M.meryl
print to (stdout)
Detected k-mer size: 21
[E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta
[faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai

    Generate haplotype marker sites bed

            -- chr21-db_M

usage: meryl-lookup
-sequence <input1.fasta> [<input2.fasta>]
-output []
-mers <input1.meryl> [<input2.meryl>] [...] [-estimate]
-labels [] [...]

Compare kmers in input sequences against kmers in input meryl databases.

Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or
compressed with gzip, xz, or bzip2.

Report types:

-bed:
Generate a BED format file showing the location of kmers in
any input database on each sequence in 'input1.fasta'.

-wig-count:
Generate a WIGGLE format file showing the multiplicity of the
kmer starting at each position in the sequence, if it exists in
an input kmer database.

-wig-depth:
Generate a WIGGLE format file showing the number of kmers in
any input database that cover each position in the sequence.

-existence:
Generate a tab-delimited line for each input sequence with the
number of kmers in the sequence, in the database and common to both.

-include:
-exclude:
Copy sequences from 'input1.fasta' (and 'input2.fasta') to the
corresponding output file if the sequence has at least one kmer
present (include) or no kmers present (exclude) in 'input1.meryl'.

Run meryl-lookup <report-type> -help for details on each method.

Unknown option '-dump'.
No report-type (-bed, -wig-count, -wig-depth, -existence, -include, -exclude) supplied.
Using system JDK.
INFO [Nov 29,2023 18:47] [Globals] Development mode is enabled
SEVERE [Nov 29,2023 18:47] [IgvTools] Genome definition file not found for: hic_pred_hap1.fa.fasta.fai
SEVERE [Nov 29,2023 18:47] [IgvTools] org.broad.igv.tools.PreprocessingException: Genome definition file not found for: hic_pred_hap1.fa.fasta.fai
at org.igv/org.broad.igv.tools.IgvTools.loadGenome(IgvTools.java:1156)
at org.igv/org.broad.igv.tools.IgvTools.doCount(IgvTools.java:826)
at org.igv/org.broad.igv.tools.IgvTools.run(IgvTools.java:371)
at org.igv/org.broad.igv.tools.IgvTools.main(IgvTools.java:254)

            -- chr21-db_F

usage: meryl-lookup
-sequence <input1.fasta> [<input2.fasta>]
-output []
-mers <input1.meryl> [<input2.meryl>] [...] [-estimate]
-labels [] [...]

Compare kmers in input sequences against kmers in input meryl databases.

Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or
compressed with gzip, xz, or bzip2.

Report types:

-bed:
Generate a BED format file showing the location of kmers in
any input database on each sequence in 'input1.fasta'.

-wig-count:
Generate a WIGGLE format file showing the multiplicity of the
kmer starting at each position in the sequence, if it exists in
an input kmer database.

-wig-depth:
Generate a WIGGLE format file showing the number of kmers in
any input database that cover each position in the sequence.

-existence:
Generate a tab-delimited line for each input sequence with the
number of kmers in the sequence, in the database and common to both.

-include:
-exclude:
Copy sequences from 'input1.fasta' (and 'input2.fasta') to the
corresponding output file if the sequence has at least one kmer
present (include) or no kmers present (exclude) in 'input1.meryl'.

Run meryl-lookup <report-type> -help for details on each method.

Unknown option '-dump'.
No report-type (-bed, -wig-count, -wig-depth, -existence, -include, -exclude) supplied.
Using system JDK.
INFO [Nov 29,2023 18:47] [Globals] Development mode is enabled
SEVERE [Nov 29,2023 18:47] [IgvTools] Genome definition file not found for: hic_pred_hap1.fa.fasta.fai
SEVERE [Nov 29,2023 18:47] [IgvTools] org.broad.igv.tools.PreprocessingException: Genome definition file not found for: hic_pred_hap1.fa.fasta.fai
at org.igv/org.broad.igv.tools.IgvTools.loadGenome(IgvTools.java:1156)
at org.igv/org.broad.igv.tools.IgvTools.doCount(IgvTools.java:826)
at org.igv/org.broad.igv.tools.IgvTools.run(IgvTools.java:371)
at org.igv/org.broad.igv.tools.IgvTools.main(IgvTools.java:254)

    Sort hic.bed

/home/lyc/.conda/envs/porec/share/merqury/trio/switch_error.sh hic.sort.bed hic 100 20000

    java -jar -Xmx1g /home/lyc/.conda/envs/porec/share/merqury/trio/bedMerToPhaseBlock.jar hic.sort.bed hic.100_20000 100 20000 

Processing file hic.sort.bed
Found and as haplotypes.
Running time : 0 h 0 m 0 sec

awk: cmd. line:1: fatal: division by zero attempted

java -jar -Xmx1g /home/lyc/.conda/envs/porec/share/merqury/eval/bedCalcN50.jar hic.100_20000.phased_block.bed | tail -n1 | awk -v out=hic.100_20000 -v swi="" '{print out"\t"$0"\tswi}' - >> hic.100_20000.phased_block.stats
Processing file hic.100_20000.phased_block.bed
Num. blocks: 0
Genome covered bases: 0
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
at java.base/java.util.Objects.checkIndex(Objects.java:359)
at java.base/java.util.ArrayList.get(ArrayList.java:427)
at javax.arang.bed.CalcN50.hooker(CalcN50.java:41)
at javax.arang.IO.Rwrapper.go(Rwrapper.java:14)
at javax.arang.bed.CalcN50.main(CalcN50.java:66)

Count and hap-mers per block to hic.100_20000.phased_block.counts
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'

Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_blob.R -f hic.100_20000.phased_block.counts -o hic.100_20000.phased_block.blob
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in [.data.frame(dat, , 4) : undefined columns selected
Calls: blob_plot -> [ -> [.data.frame
In addition: Warning message:
In max(dat[, 3]) : no non-missing arguments to max; returning -Inf
Execution halted

Thank you once again for your response, and I apologize for any inconvenience caused!