Loading ONNX model became significantly slower since 0.21.4

Question

Loading ONNX model became significantly slower since 0.21.4

Closed this issue 5 months ago · 7 comments

OS: Ubuntu 22.04
arch: X86_64
rustc: 1.77.2

After upgrading from 0.21.3 to 0.21.4, loading the attached model became significantly slower.

For debug build it slowed down from 1 sec to 70 sec.
For release build, 0.16 sec to 2.5 sec.

Here is the test program to reproduce.

use std::env;

use tract_onnx::prelude::*;

fn main() {
    let args = env::args().collect::<Vec<_>>();
    let model_path = &args[1];
    let now = std::time::Instant::now();
    let _model = tract_onnx::onnx()
        .model_for_path(model_path)
        .unwrap()
        .into_runnable()
        .unwrap();
    let duration = now.elapsed();
    println!("Model loaded in {:?}", duration);
}

The onnx model file is gzipped and needs to be unzipped before loading.

model.onnx.gz

Answer 1 · 2024-04-25T11:12:35.000Z

Hello, thanks for bringing this to my attention. 0.21.4 introduce an optimized computation order for being more memory/cache friendly in the case of very big models and lazy weight loading/decompression. But the algorithm itself that compute this order is more expensive, and was not optimized as it could have been.

PR #1398 optimizes... this new optimizer. Loading is not as fast as it was in 0.21.3 but it's much more acceptable that what you measured in 0.21.4.

Feel free to give it a shot and tell me what you think.

Answer 2 · 2024-04-26T00:58:16.000Z

@kali

Hi. Thank you for response. I tried the latest main branch and I did observe improvement of loading speed as you said.

debug -> 7 sec
release -> 0.2 sec

It is indeed acceptable. Still I prefer the speed of 0.21.3 especially when I am developing and running unit tests multiple times.
Waiting 1 sec is way better than 7 sec, (of course not bad as 70 sec)
Is it possible to include an option to switch on/off the optimization introduced in 0.21.4?

Some thing like.

let model = tract_onnx::onnx()
        .model_for_path(model_path)
        .unwrap()
        .set_optimizatoin(true/false)
        .unwrap()
        .into_runnable()
        .unwrap();

I want to set false during development but enable it in actual deployment.

I also want to ask how this new optimization differs from into_optimized .

Answer 3 · 2024-04-26T07:13:47.000Z

This "new-order" optimisation happens when we compute the execution order for the model. This is more or less what into_runnable() does. Before I did this optimisation, it was already computing an execution order, but with a simpler algorithm, resulting in an execution order that was less optimized.

Regardless of whether or not we add a way to opt-out of the improved execution order, you should always call into_optimized() before you use into_runnable(): into_optimized() will substitue generic operators on which tract can reason and perform some model transformation (or serialization, etc) with platform-specific ones that are faster to run but sometimes harder to make sense of. These optimisation are independent from the execution order.

I will give a few thought about opting-out of the "new order". The old algorithm is still present so it should not be too hard.

Answer 4 · 2024-04-26T10:45:34.000Z

Thank you for the explanation. I'm looking forward to see the opt out feature.

Answer 5 · 2024-04-26T11:40:10.000Z

You can have a look at #1400 .

      .into_runnable_with_options(&PlanOptions {
            skip_order_opt_ram: true,
            ..PlanOptions::default()
        })

And please consider going through into_optimized() for the non-order related optimisations. :)

Answer 6 · 2024-04-27T04:20:19.000Z

@kali

I tried the branch of #1400 with the above option and confirmed the loading is as fast as 0.21.3 again.
Thank you so much.

Answer 7 · 2024-05-13T07:17:53.000Z

Closing as 0.21.5 contains these fixes.