JuliaML/MLDatasets.jl

New test failing ("ml-10m")

Closed this issue · 3 comments

I'm getting a really strange error in a local test that I can't seem to figure out why.

I've attached a screenshot, but it says that there's missing "end" with the "begin" at line 115 of the graphs_no_ci.jl test file, but the "end" seems to be there.

I'd like to know if I am the only one experiencing this.

It happens on Julia 1.7.3 running Ubuntu 22.04.
I also tested it on Windows 11 and the same error appeared.

Screenshot from 2022-07-01 19-47-14

Let me look into this.

Hi @Dsantra92, the patch fixed the original error, but now I'm getting a new error.

     Testing Running tests...
[ Info: Testing larger datasets
graphs_no_ci.jl: Error During Test at /home/christian/.julia/packages/MLDatasets/fdW2b/test/runtests.jl:48
  Got exception outside of a @test
  LoadError: syntax: incomplete: "begin" at /home/christian/.julia/packages/MLDatasets/fdW2b/test/datasets/graphs_no_ci.jl:115 requires end
  Stacktrace:
    [1] top-level scope
      @ ~/.julia/packages/MLDatasets/fdW2b/test/datasets/graphs_no_ci.jl:115
    [2] include(fname::String)
      @ Base.MainInclude ./client.jl:451
    [3] macro expansion
      @ ~/.julia/packages/MLDatasets/fdW2b/test/runtests.jl:49 [inlined]
    [4] macro expansion
      @ ~/.julia/juliaup/julia-1.7.3+0~x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1359 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/MLDatasets/fdW2b/test/runtests.jl:48 [inlined]
    [6] macro expansion
      @ ~/.julia/juliaup/julia-1.7.3+0~x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
    [7] top-level scope
      @ ~/.julia/packages/MLDatasets/fdW2b/test/runtests.jl:42
    [8] include(fname::String)
      @ Base.MainInclude ./client.jl:451
    [9] top-level scope
      @ none:6
   [10] eval
      @ ./boot.jl:373 [inlined]
   [11] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:268
   [12] _start()
      @ Base ./client.jl:495
  in expression starting at /home/christian/.julia/packages/MLDatasets/fdW2b/test/datasets/graphs_no_ci.jl:115

I managed to get rid of the error by changing the '!' in line 365 of "src/datasets/graphs/movielens.jl" to a ':', but this feels like a hack.

I did a bit of digging and it does not happen on Windows. The output of typeof(tag_df[:,3] on Windows and Linux is Vector{String} (alias for Array{String, 1}), while the output of typeof(tag_df[!,3] on Windows is the same, but on Linux, it's SentinelArrays.ChainedVector{String, Vector{String}}, which is what seems to be causing the error.

Upon further examination, it seems like the text is not being read in properly for 5 identical lines. the lines are lines 50208, 50572, 50593, 52325, 52891 of the ml-10m "tags.dat". I believe the presence of "= in the string is causing CSV to assign undef.

This happens with a ! or a :, so fixing with a colon would have only hidden the issue.