red-data-tools/red-datasets

Postponing to load individual datasets

mrkn opened this issue · 3 comments

mrkn commented

I'm concerned that many datasets are loaded in advance regardless of their necessities.
https://github.com/red-data-tools/red-datasets/blob/master/lib/datasets.rb#L3-L34

How do you think about postponing to load individual datasets?

kou commented

Are you thinking about start-up time?

I agree with less start-up time is better. But I don't want users to force requiring each dataset explicitly such as require "datasets/iris".
We can use autoload for this case but I don't like autoload a bit... For example, autoload doesn't work in Ractor.

Do you have any idea to implement this proposal?

mrkn commented

I don't know how to overcome the problem due to non-main Ractors. I guess we need a mechanism that non-main Ractors let the main Ractor load libraries.

kou commented

I've implemented this.

We can postpone to load by datasets/lazy:

require "datasets/lazy"
# Datasets::Iris isn't loaded yet
Datasets::Iris # Datasts::Iris is loaded now