upgrade from 0.34.1 to 0.45 make all query work very slow

Question

upgrade from 0.34.1 to 0.45 make all query work very slow

Closed this issue 2 years ago · 7 comments

swi2012 commented 2 years ago

Hello. Today i tried to swtich to 0.45 and found out that speed of odbc operations became very very slow.
the same test code run with 0.34.1:
cargo run 0,50s user 1,64s system 61% cpu 3,485 total
with 0.45:
cargo run 72,97s user 2,39s system 96% cpu 1:18,00 total

The test code:

use odbc_api::IntoParameter;
use odbc_api::{buffers::TextRowSet, Cursor, Environment};
use std::error::Error;
use std::io::{self, stdin, Read};

const BATCH_SIZE: usize = 6000;

fn main() -> Result<(), Box<dyn Error>> {
    let odbc_env = Environment::new().unwrap();
    let mut conn = odbc_env.connect("database", "user", "pass");
    match conn {
        Ok(c) => {
            println!("Connect success");
            let mut stmt = c.preallocate().unwrap();
            for i in 1..2000 {
                match stmt.execute(
                    "select * from dba.test_table where id = ?",
                    &i.into_parameter(),
                ) {
                    Err(e) => eprintln!("Error execute quiery: {}", e),
                    Ok(None) => eprintln!("No result set"),
                    Ok(Some(mut cursor)) => {
                        let mut buf = TextRowSet::for_cursor(BATCH_SIZE, &mut cursor, Some(4096)).unwrap();
                        let mut rowset = cursor.bind_buffer(&mut buf).unwrap();
                        while let Some(batch) = rowset.fetch().unwrap() {
                            for row_index in 0..batch.num_rows() {
                                let _id = batch.at_as_str(0, row_index).unwrap();
                                let _name = batch.at_as_str(1, row_index).unwrap();
                                print!("{} -> {}", &_id.unwrap_or(""), &_name.unwrap_or(""));
                                println!();
                            }
                        }
                    }
                };
            }
        }
        Err(e) => eprintln!("cant connect to ODBC, {}", e),
    }
    Ok(())
}

Answer 1 · 2022-07-02T09:59:40.000Z

Hello @swi2012 ,

thanks for reporting this regression. I looked at the code changes and took an educated guess that most likely the faliable buffer allocation introduced in for_cursor is to blame. odbc-api 0.45.1 has been released only performing a faliable buffer allocation if on size limit is set (which you do). Tl;dr: If my guess is correct it should perform as before. I would like to hear back, and also have a question:

What platform are you using to measure? I was never ablte to reproduce the performance regression.

Two performance hints:

In case your measurement is similar to what you do in production. I would highly recommend allocating TextRowSet outside of the loop, and reuse it. This would also (again, if my guess is correct) have prevented the performance degradation.
If you want call the same query with the same arguments repeatedly I would recommend using a Prepared query instead of a Preallocated one.

Both these changes are likely to speed up your snippet significantly. Always depends on what you want to measure of course.

Cheers, Markus

Answer 2 · 2022-07-02T11:33:54.000Z

Thanks for response.

thanks for reporting this regression. I looked at the code changes and took an educated guess that most likely the faliable buffer allocation introduced in for_cursor is to blame. odbc-api 0.45.1 has been released only performing a faliable buffer allocation if on size limit is set (which you do). Tl;dr: If my guess is correct it should perform as before. I would like to hear back, and also have a question:

I tried this code with 0.45.1, nothing changed, slow as on 0.45.

What platform are you using to measure? I was never ablte to reproduce the performance regression.

rustc 1.60.0-nightly (09cb29c64 2022-02-15)
Linux 5.18.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.5-1 (2022-06-16) x86_64 GNU/Linux

Two performance hints:

In case your measurement is similar to what you do in production. I would highly recommend allocating TextRowSet outside of the loop, and reuse it. This would also (again, if my guess is correct) have prevented the performance degradation.

This code is plain simple simulation of many query for db to measure execution time. My real code is parsing thousands of xml files and make db call for some of them. For example it get 31 xml files to handle, and for each of them i call one select without TextRowSet (i just check if select is empty or not), two selects with TextRowSet to get needed data from db, and two inserts to db (so now TextRowSet too). So for 31 fiules there weill be 62 call to TextRowSet creation.

About reusing the same buffer, i can't figure out how to create it without ::for_cursor.

Answer 3 · 2022-07-03T00:26:43.000Z

Hello @swi2012 ,

thanks for the feedback.

Performance with ODBC is mostly about three things:

Minimizing the need for roundtrips and IO
Minimizing calls into to the ODBC driver manager and driver
Avoiding allocations

usually in that order.

Since my first gues seemed wrong, we need to measure, I am baffeled though. Since the code you seem to use, barely changed between versions. The most suspicious part seemed to be the faliable buffer allocation. I am currently not able to reproduce it (yet). Therfore I would kindly ask you to split the measurement of the entire thing into single steps.

How much time is spend on execute
How much time is spend on for_cursor
How much time is spend on fetch

And even more important: Which of these three changed between versions. Otherwise I would be completly in the dark, what might cause your regression.

For private reasons I might not have internet for about two weeks. We'll see, but I may not be able to react as promptly as I would like to in the near future.

Best, Markus

Answer 4 · 2022-07-03T14:17:04.000Z

Hello. Some new about this. I updated my rust to current nightly (1.64) and this test code run pretty fast. My real prod code stayed slow as it was. I've rewrite my code to use one TextRowSet buffer, as you advised, created as this:

    let mut res_buf =
        TextRowSet::from_max_str_lens(BATCH_SIZE, [1000, 1000, 1000].iter().cloned()).unwrap();

and then use it in functions that do db calls. Now it work a little bit slower than the 0.34.1 version was 0.64s vs 0.5s but such difference is insignificant.

Answer 5 · 2022-07-03T14:35:01.000Z

Hello, thanks for getting back to me. Everything you tell me (apart from the fact that it didn't get faster with 0.45.1) seems to be indicating to me that it should be the faliable allocation of the buffers, which causes the errors. Please note that the faster infallibale allocation is currently only used if you use for_cursor and set Some(limit).

The original usecase seems less relevant, so I could get rid of faliable allocations all together if they continue to cause trouble. I am annoyed by the fact that so far I could not reproduce it's detrimental effects to performance myself (admitedly I've little time to do so at the moment).

Maybe in the next nightly faliable allocations perform better? I don't know. Just guessing / hoping.

I am contemplating a breaking change in interface to let users just opt into the faliable allocation.

Could you do me a favour. Just check out this repository and run cargo bench. Please post the results.

For me it does not look like much of a difference, but I wonder if the same is true for you.

fallibale buffer allocation
                        time:   [8.2934 ms 8.8126 ms 9.4165 ms]
                        change: [-6.9786% +0.6991% +8.8201%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  17 (17.00%) high severe

infallibale buffer allocation
                        time:   [8.4869 ms 8.9907 ms 9.5726 ms]
                        change: [-4.3625% +3.7445% +12.466%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) high mild
  21 (21.00%) high severe

     Running unittests src\main.rs (target\release\deps\odbcsv-716c44914f8920c4.exe)

running 0 tests

Sorry for the inconvinience, I hope you otherwise enjoy using this crate.

Best, Markus

Answer 6 · 2022-07-03T14:53:56.000Z

Hello. I've tried with for_cursor(BATCH_SIZE, &cursor, None) and do see any difference with Some() variant in speed.
cargo bench output:

test result: ok. 0 passed; 0 failed; 7 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/benches.rs (target/release/deps/benches-89da8f8acedc472a)
Gnuplot not found, using plotters backend
fallibale buffer allocation                                                                            
                        time:   [5.7259 ms 5.7274 ms 5.7290 ms]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

infallibale buffer allocation                                                                            
                        time:   [5.8743 ms 5.8760 ms 5.8782 ms]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

     Running unittests src/main.rs (target/release/deps/odbcsv-d33cd1fdb9eebabc)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Thanks

Answer 7 · 2022-08-16T06:23:16.000Z

Sorry, just can not reproduce this. Both versions perform equally well on my system. Of course I do not have precisly your table and your setup.

FYI: The slowest thing in the code above is println! by far (again on my system).

I'll close this issue, since I can not fix it. Also the only meaningful code change, between these versions in terms of performance did not seem to make a difference to you.

We can reopen it, but I would need either:

A reproducing setup which works on GitHub CI, compelete with sample data.
An analysis which of the odbc functions in question has suppossedly gotten slower.

If you have your setup still up and running you could try producing a flamegraph. Use cargo flamegraph.

Cheers, Markus