sonos/tract

Loading ONNX model became significantly slower since 0.21.4

Closed this issue · 7 comments

  • OS: Ubuntu 22.04
  • arch: X86_64
  • rustc: 1.77.2

After upgrading from 0.21.3 to 0.21.4, loading the attached model became significantly slower.

  • For debug build it slowed down from 1 sec to 70 sec.
  • For release build, 0.16 sec to 2.5 sec.

Here is the test program to reproduce.

use std::env;

use tract_onnx::prelude::*;

fn main() {
    let args = env::args().collect::<Vec<_>>();
    let model_path = &args[1];
    let now = std::time::Instant::now();
    let _model = tract_onnx::onnx()
        .model_for_path(model_path)
        .unwrap()
        .into_runnable()
        .unwrap();
    let duration = now.elapsed();
    println!("Model loaded in {:?}", duration);
}

The onnx model file is gzipped and needs to be unzipped before loading.

model.onnx.gz

kali commented

Hello, thanks for bringing this to my attention. 0.21.4 introduce an optimized computation order for being more memory/cache friendly in the case of very big models and lazy weight loading/decompression. But the algorithm itself that compute this order is more expensive, and was not optimized as it could have been.

PR #1398 optimizes... this new optimizer. Loading is not as fast as it was in 0.21.3 but it's much more acceptable that what you measured in 0.21.4.

Feel free to give it a shot and tell me what you think.

@kali

Hi. Thank you for response. I tried the latest main branch and I did observe improvement of loading speed as you said.

  • debug -> 7 sec
  • release -> 0.2 sec

It is indeed acceptable. Still I prefer the speed of 0.21.3 especially when I am developing and running unit tests multiple times.
Waiting 1 sec is way better than 7 sec, (of course not bad as 70 sec)
Is it possible to include an option to switch on/off the optimization introduced in 0.21.4?

Some thing like.

let model = tract_onnx::onnx()
        .model_for_path(model_path)
        .unwrap()
        .set_optimizatoin(true/false)
        .unwrap()
        .into_runnable()
        .unwrap();

I want to set false during development but enable it in actual deployment.

I also want to ask how this new optimization differs from into_optimized .

kali commented

This "new-order" optimisation happens when we compute the execution order for the model. This is more or less what into_runnable() does. Before I did this optimisation, it was already computing an execution order, but with a simpler algorithm, resulting in an execution order that was less optimized.

Regardless of whether or not we add a way to opt-out of the improved execution order, you should always call into_optimized() before you use into_runnable(): into_optimized() will substitue generic operators on which tract can reason and perform some model transformation (or serialization, etc) with platform-specific ones that are faster to run but sometimes harder to make sense of. These optimisation are independent from the execution order.

I will give a few thought about opting-out of the "new order". The old algorithm is still present so it should not be too hard.

Thank you for the explanation. I'm looking forward to see the opt out feature.

kali commented

You can have a look at #1400 .

      .into_runnable_with_options(&PlanOptions {
            skip_order_opt_ram: true,
            ..PlanOptions::default()
        })

And please consider going through into_optimized() for the non-order related optimisations. :)

@kali

I tried the branch of #1400 with the above option and confirmed the loading is as fast as 0.21.3 again.
Thank you so much.

kali commented

Closing as 0.21.5 contains these fixes.