Loading big models is slow

Question

Loading big models is slow

Closed this issue 6 months ago · 9 comments

Hi,

i am trying to deploy a trained ydf model (tfdf.keras.RandomForestModel) to a serverless FaaS-like inference service in order to save resources (only 2-3 invocations per day).
For this, the model needs to be loaded basically every time a prediction is done. However, loading the model takes more than 30 minutes, which is too long for this usecase. I am not a c++ dev, so my debugging efforts have been relatively limited up to now. Initially i only used the tfdf library, since then i have switched to the ydf python library, with no performance increase (which is probably to be expected since the use the same c++ backend). The only difference is, that the ydf library does not seem to actually load the model fully when calling

loaded_model = ydf.from_tensorflow_decision_forests(model_path)

but rather only when the first predict call is made.

The code

logger.info("predicting...")
live_predictions = loaded_model.predict(live_data)

is producing

2024-03-06 09:03:03.992 | INFO     | __main__:<module>:45 - predicting...
[INFO 24-03-06 09:33:24.2661 EST decision_forest.cc:700] Model loaded with 200 root(s), 2681168 node(s), and 1191 input feature(s).
[INFO 24-03-06 09:33:24.3790 EST abstract_model.cc:1344] Engine "RandomForestGeneric" built

As you can see, it takes ~30 minutes until the model is fully loaded. This happens on my local machine (M2 Air), but also on all other machines i tried (including x86 ones).
If you think that a model with these parameters is supposed to take this long to load, feel free to close this issue, but to me this seems a bit too slow judging by the fact that it is possible to load gigantic models like llama in sub 10 seconds.

Thanks in advance!

Answer 1 · 2024-03-07T13:22:08.000Z

Hi, thank you for reporting - 30 minutes sound way too long, so I want to investigate this further. While I look into this more closely, can you please give me

more information about any training parameters you're using (looks like it's 200 trees, any restrictions on the depth? other hyperparameters)
What types of features do you have in your dataset?
Size of the model in bytes
Do you have the same issue if the model is trained by YDF?
The output if you set ydf.verbose(2) before loading the model

Answer 2 · 2024-03-07T13:33:24.000Z

Hi @rstz ,
thanks for the fast response. The hyperparameters used for training were

{
  "categorical_algorithm": "CART",
  "max_depth": "16",
  "min_examples": "30",
  "num_trees": "200",
  "sparse_oblique_normalization": "MIN_MAX",
  "sparse_oblique_projection_density_factor": "8.0",
  "split_axis": "SPARSE_OBLIQUE",
}

However, i have trained this model as part of a grid search across a large number of hyperparameters and from what i can see, all models take this long to load.
The features are floats in the range [0;1], effectively only taking on the values {0,0.25,0.5,0.75,1}, however i suppose that doesnt make a difference.

The model dir looks like this size-wise:

306M	./assets
4,0K	./fingerprint.pb
496K	./keras_metadata.pb
6,6M	./saved_model.pb
8,0K	./variables

I havent tried to train the model with YDF directly, i can try to do that if you think that this might improve performance. Training the model with tfdf took ~72h (596400 examples and 1191 features).

Answer 3 · 2024-03-07T13:38:38.000Z

Thank you! Instead of re-training the model, you can also load it in with YDF, then save it with YDF to a new directory and measure the time to re-load it

loaded_model = ydf.from_tensorflow_decision_forests(model_path)
loaded_model.save("/tmp/my_model")

start_time = time.time()
re_loaded_model = ydf.load_model("/tmp/my_model")
end_time = time.time()

elapsed_time = end_time - start_time
print("Elapsed time:", elapsed_time, "seconds")

That might help me narrow down if the issue is with the importer (which is mostly written in Python and is, presumably, not very fast) or somewhere in C++

Answer 4 · 2024-03-07T13:40:49.000Z

Of course, if it's possible to share the model, feel free to do so - but I know that this is often not possible

Answer 5 · 2024-03-07T14:01:51.000Z

Okay, thanks.
As i said from_tensorflow_decision_forests doesnt seem to load the model fully and neither does ydf.load_model.
The execution time for the script you provided is ~2 seconds, however, as soon as i try to use the loaded model (i.e. add a prediction to the script), it seems to actually load it and then does the prediction, which takes ~30 minutes. Subsequent predictions take less then 10 seconds.

I am unable to share this exact model, but i can provide a different one with similar properties suffering from the same issue:
~~https://www.comet.com/api/registry/model/item/download?modelItemId=bV0izwCSlrbc8ZpJJpiCfIrbp~~

Here is a csv with some sample data which you can use to make a prediction
sample_data.csv

This should behave the exact same, let me know if something doesnt work.

Answer 6 · 2024-03-07T14:09:23.000Z

Okay, i just noticed that the provided model does, in fact, not suffer from the same issue. So here is the actual model im having the issue with instead:
removed

Answer 7 · 2024-03-07T14:12:08.000Z

Thanks, I'll take a closer look and report back

Answer 8 · 2024-03-11T17:00:51.000Z

Hi, quick update. I'm fairly sure I found the issue - the combination of hyperparameters you're using really creates very slow creation of the prediction engine (which, in Python, happens before the first call to predict).

Good news is that we can probably make this ~10 times faster fairly easily (on your model, I got 15 minutes loading time down to 1.6 minutes with a quick prototype). Now all I need to do is the usual software engineering - validate it, make sure the fix has a reasonable design, testing, releasing, ...

Answer 9 · 2024-03-11T17:40:01.000Z

That sounds awesome, thank you so much!