neherlab/pangraph

Error while running pipeline

UpalabdhaD opened this issue · 5 comments

Dear authors, I think you might help me!
script:

#!/bin/bash

FASTA=$(realpath "$1")
OUTPUT=$(realpath "$2")

export JULIA_NUM_THREADS=20

julia --project=. src/PanGraph.jl \
    build \
    --random-seed 01021995 \
    --circular \
    "$FASTA" \
    > "$OUTPUT"

bash run_pangraph.sh ../../Results/Pangenome/PanGraph/genome.circular.chr.fa ../../Results/Pangenome/PanGraph/output.json

2024-09-02T20:51:55.884 --> ordering
2024-09-02T20:52:12.740 --> tree:

Error: In-thread error during graph building:
│ exception =
│ TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, typeof(minimap2_jll.minimap2)}
│ Stacktrace:
│ [1] Main.PanGraph.Minimap.MapOptions(idx::Base.RefValue{Main.PanGraph.Minimap.IndexOptions}, minblock::Int64, preset::String)
│ @ Main.PanGraph.Minimap /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/minimap.jl:258
│ [2] align(ref::Main.PanGraph.PanContigs{Vector{String}}, qry::Main.PanGraph.PanContigs{Vector{String}}, minblock::Int64, preset::String)
│ @ Main.PanGraph.Minimap /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/minimap.jl:343
│ [3] macro expansion
│ @ ~/.julia/packages/Rematch/tDZmb/src/Rematch.jl:220 [inlined]
│ [4] (::Main.PanGraph.var"#aligner#10"{String, Int64})(contigs₁::Main.PanGraph.PanContigs{Vector{String}}, contigs₂::Main.PanGraph.PanContigs{Vector{String}})
│ @ Main.PanGraph /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/build.jl:193
│ [5] do_align(G₁::Main.PanGraph.Graphs.Graph, G₂::Main.PanGraph.Graphs.Graph, energy::Function, aligner::Main.PanGraph.var"#aligner#10"{String, Int64})
│ @ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:389
│ [6] align_pair(G₁::Main.PanGraph.Graphs.Graph, G₂::Main.PanGraph.Graphs.Graph, energy::Function, minblock::Int64, aligner::Function, verify::Main.PanGraph.Graphs.Align.var"#verify#64"{Main.PanGraph.Graphs.Align.var"#verify#55#65"{Nothing}}, verbose::Bool)
│ @ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:573
│ [7] (::Main.PanGraph.Graphs.Align.var"#59#69"{Int64, Int64, Main.PanGraph.var"#2#6"{Int64, Int64, Int64}, Int64, Int64, Bool, Main.PanGraph.var"#aligner#10"{String, Int64}, Int64})()
│ @ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:724
│ [8] lock_semaphore(f::Main.PanGraph.Graphs.Align.var"#59#69"{Int64, Int64, Main.PanGraph.var"#2#6"{Int64, Int64, Int64}, Int64, Int64, Bool, Main.PanGraph.var"#aligner#10"{String, Int64}, Int64}, s::Base.Semaphore)
│ @ Main.PanGraph.Graphs.Utility /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/util.jl:33
│ [9] (::Main.PanGraph.Graphs.Align.var"#58#68"{Main.PanGraph.var"#2#6"{Int64, Int64, Int64}, Int64, Int64, Bool, Int64, Nothing, Main.PanGraph.var"#aligner#10"{String, Int64}, ReentrantLock, Base.Semaphore, Channel{Any}, Dict{String, Main.PanGraph.Graphs.Graph}, ProgressMeter.Progress, Main.PanGraph.Graphs.Align.Clade, Int64})()
│ @ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:722
└ @ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:764
ERROR: LoadError: graph construction failed, see above for stacktrace
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] align(::Main.PanGraph.var"#aligner#10"{String, Int64}, ::Main.PanGraph.Graphs.Graph, ::Vararg{Main.PanGraph.Graphs.Graph}; compare::Function, energy::Function, minblock::Int64, reference::Nothing, maxiter::Int64, verbose::Bool, rand_seed::Int64, debugdir::Nothing)
@ Main.PanGraph.Graphs.Align /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/align.jl:765
[3] (::Main.PanGraph.var"#1#5")(args::Vector{String})
@ Main.PanGraph /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/build.jl:216
[4] run(cmd::Main.PanGraph.Commands.Command, args::Vector{String})
@ Main.PanGraph.Commands /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/args.jl:182
[5] main(args::Vector{String})
@ Main.PanGraph /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/PanGraph.jl:162
[6] top-level scope
@ /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/PanGraph.jl:179
in expression starting at /SynologyNAS/PROJECTS/path/to/mydir/tools/pangraph/src/PanGraph.jl:1

Hi @UpalabdhaD,
your script looks correct to me. From the stack trace it looks like an error in the ported version of minimap2. What architecture are you running pangraph on? Currently I'm afraid we don't support MacOS (the ported version of minimap does not work on it, but we're working on a newer version that does).
If this is not the problem then could you also give more details on the input files? If possible could you share a small example input fasta file which makes pangraph fail and I can try to reproduce it on my end.

Hey,
Thanks for replying. I am in a debian with x86.

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   48
  On-line CPU(s) list:    0-47
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   12
    Socket(s):            2
    Stepping:             7
    CPU(s) scaling MHz:   47%


Just pulled down the docker cont as well with latest tag. Then checked for the example ecoli dataset.

tree  workdir/
workdir/
└── example_datasets
    └── ecoli.fa.gz

issued this:
sudo docker run --rm -it --name "pangraph-$(date +%s)" --volume="$(pwd):/workdir" --user="$(id -u):$(id -g)" --workdir=/workdir neherlab/pangraph:latest bash -c "pangraph build --circular --alpha 0 --beta 0 /workdir/example_datasets/ecoli.fa.gz > graph.json"

Got this:

ERROR: GZip.GZError(-1, "gzopen failed")
Stacktrace:
  [1] gzopen(fname::String, gzmode::String, gz_buf_size::Int64)
    @ GZip ~/root/.julia/packages/GZip/JNmGn/src/GZip.jl:251
  [2] gzopen
    @ ~/root/.julia/packages/GZip/JNmGn/src/GZip.jl:264 [inlined]
  [3] gzopen(f::PanGraph.var"#graph#8"{Bool, Bool}, args::String)
    @ GZip ~/root/.julia/packages/GZip/JNmGn/src/GZip.jl:268
  [4] open(::Function, ::Vararg{Any})
    @ GZip ~/root/.julia/packages/GZip/JNmGn/src/GZip.jl:265
  [5] open(::Function, ::String)
    @ PanGraph ~/build_dir/src/PanGraph.jl:106
  [6] (::PanGraph.var"#3#9")(file::String)
    @ PanGraph ./none:0
  [7] iterate
    @ ./generator.jl:47 [inlined]
  [8] iterate
    @ ./iterators.jl:1118 [inlined]
  [9] iterate(f::Base.Iterators.Flatten{Base.Generator{Vector{String}, PanGraph.var"#3#9"}})
    @ Base.Iterators ./iterators.jl:1114
 [10] (::PanGraph.var"#1#5")(args::Vector{String})
    @ PanGraph ~/build_dir/src/build.jl:216
 [11] run(cmd::PanGraph.Commands.Command, args::Vector{String})
    @ PanGraph.Commands ~/build_dir/src/args.jl:182
 [12] main(args::Vector{String})
    @ PanGraph ~/build_dir/src/PanGraph.jl:162
 [13] julia_main()
    @ PanGraph ~/build_dir/src/PanGraph.jl:169
 [14] top-level scope
    @ none:1

Here is the sample sequences:
sample_seq.fa.gz

Concerning the docker error, I tried re-running all of the steps and on my machine everything works. The error that you see looks like the error that one would get if a wrong input path is provided. In particular the command that you wrote should work once you are inside of workdir.

For the sample_seq.fa.gz, I also tried your command and it does work on my machine (Linux Ubuntu 64-bit). I cannot reproduce the error. Another guess is maybe that you do not have the correct julia version (v1.7.2)? As mentioned, we're working on a version that will be hopefully easier to distribute. In the meantime I think that if you manage to run it on docker, this should work for building the graph.

Hi @mmolari thanks for pointing out the error. I have successfully run the docker with the dataset.
Closing this thread.