JuliaCollections/OrderedCollections.jl

Remerging into DataStructures

oxinabox opened this issue · 8 comments

I have been thinking that at some point we should merge OrderedCollections back into DataStructures.jl
The reason to seperate it out was to decrease size of the dependencies that Revise.jl has so it loads faster.
But now Revise has a JuliaInterpreter dependency, which is much much bigger than DataStructures ever will be.

We should work out what the requirements to do that is.
The only one I can think of is the DataStructures.jl needs to hit 1.0.
which is basically constrained by checking the API has gotten rid of stuff that is hanging on from Julia 0.3.

Having them in sperate packages causes problems like in #39

Let's compare load times, but your plan sounds sensible.

On Julia 1.6 with the revise3 branch:

tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using OrderedCollections
  0.047111 seconds (49.33 k allocations: 3.746 MiB)

julia> ^D
tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using DataStructures
  0.211489 seconds (245.43 k allocations: 14.640 MiB)

julia> ^D
tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using Revise
  0.555985 seconds (775.29 k allocations: 50.752 MiB, 2.27% gc time)

It's not a trivial part of the startup time. But perhaps integrated into Revise the times would be different?

I could also consider building a "private copy" of OrderedSet. I've been meaning to try to understand whether there's a workaround that would make setindex! precompilable.

On julia 1.6 I get similar timings to you for how long using OrderedCollections and using DataStructures takes.

The more intereting thing is the between Julia1.3 and Julia 1.6-Dev
the time taken for using JuliaInterpretter has halved, but the time taken for using DataStructures has stayed the same.
So where as before the time taken for loading JuliaInterpretter made the time taken for loading DataStructures pale in comparison, it no longer does.
Its pretty impressive that JuliaInterpretter loads so fast, I wonder if we are doing something bad in DataStructures that makes it so slow? and why it doesn't get the speedups JuliaInterpretter sees on load times.

It's likely to be because of my invalidation-fixing work. I'm starting to see pretty striking improvements for packages that were formerly causing trouble. JuliaInterpreter was slowed down because it (and CodeTracking.jl) invalidated some of Base's & Pkg's package-loading code. That's not happening anymore on recent Julia 1.6-DEV and recent releases of JuliaInterpreter.

Unfortunately for DataStructures, I don't expect to be able to deliver the same benefits, because it's already in pretty good shape with respect to invalidations:

julia> using SnoopCompile

julia> trees = invalidation_trees(@snoopr using DataStructures)
2-element Array{SnoopCompile.MethodInvalidations,1}:
 inserting eltype(::Type{CircularBuffer{T}}) where T in DataStructures at /home/tim/.julia/packages/DataStructures/GvsTk/src/circular_buffer.jl:159 invalidated:
   backedges: 1: superseding eltype(::Type{var"#s91"} where var"#s91"<:(AbstractArray{E,N} where N)) where E in Base at abstractarray.jl:152 with MethodInstance for eltype(::Type{A} where A<:AbstractArray{Bool,1}) (5 children) more specific
   32 mt_cache

 inserting iterate(v::Base.ValueIterator{T}, i::Int64) where T<:DataStructures.DefaultDictBase in DataStructures at /home/tim/.julia/packages/DataStructures/GvsTk/src/default_dict.jl:64 invalidated:
   backedges: 1: superseding iterate(v::Union{Base.KeySet, Base.ValueIterator}, state...) in Base at abstractdict.jl:59 with MethodInstance for iterate(::Base.ValueIterator, ::Any) (2 children) more specific
              2: superseding iterate(v::T, i::Int64) where T<:Union{Base.KeySet{var"#s88",var"#s87"} where var"#s87"<:Dict where var"#s88", Base.ValueIterator{var"#s86"} where var"#s86"<:Dict} in Base at dict.jl:678 with MethodInstance for iterate(::Base.ValueIterator{var"#s89"} where var"#s89"<:Dict, ::Int64) (8 children) ambiguous
   14 mt_cache

That's actually a bit of a lie due to timholy/SnoopCompile.jl#95, but the reality is not much worse (you can see for yourself by doing the "staged" @mysnoopr as illustrated in that issue). See SnoopCompile's docs to understand what all this means.

That doesn't mean that DataStructure's loading time can't be made better, but if there are gains to be had they will have to come from some other source.

@KristofferC will be happy to hear that JuliaInterpreter is starting to not be a giant anchor anymore.

Julia 1.6 has gotten even better, so it's time for an update:

julia> @time using OrderedCollections
  0.027468 seconds (25.76 k allocations: 1.843 MiB)

julia> 
tim@diva:~$ julia-master -q --startup-file=no
julia> @time using DataStructures 
  0.078993 seconds (100.80 k allocations: 7.651 MiB)

julia> 
tim@diva:~$ julia-master -q --startup-file=no
julia> @time using Revise
  0.418124 seconds (712.14 k allocations: 48.093 MiB)

While it would be slightly painful given how much I've worked to reduce latency, I'd be willing to put up with another 50ms. So I don't think Revise needs to be a blocker for this reunification.

(Thanks for the interesting blog post, @eulerkochy!)

What blog post is that? I was aware of his V language, but not his involvement with Julia. And V seems very interesting while the opposite of Julia... seems similar to Zig, not sure which is the better replacement for C.