tikv/client-rust

Transaction::lock_keys should wait instead of return a `Failed to resolve lock` Error.

blacktear23 opened this issue · 11 comments

Transaction::lock_keys should wait instead of return a Failed to resolve lock Error. As other language's standard library lock call will block current thread or coroutine. So Transaction::lock_keys should block until got the lock. And we can provide a try_lock_keys function that return immediately.

I think it's by design. By default the resolve lock process will retry for several times, but not retry forever.
Your requested behavior can be achieved by providing a custom RetryOptions via TransactionOptions::retry_options

I have wrote a test code like below:

async fn tikv_transaction1(txn: Transaction) {
    let mut txn = txn;
    println!("txn1 start Lock");
    match txn.lock_keys(vec!["tk2".to_owned()]).await {
        Ok(_) => {}
        Err(err) => {
            println!("txn2 lock key error: {}", err.to_string());
            txn.rollback().await.unwrap();
            return;
        }
    };
    println!("txn1 start put");
    match txn.put("tk2".to_owned(), "tv1".to_owned()).await {
        Ok(_) => {}
        Err(err) => {
            println!("txn1 {:}", err.to_string());
            txn.rollback().await.unwrap();
            return;
        }
    }
    sleep(5000).await;
    txn.commit().await.unwrap();
    println!("txn1 Commit");
}

async fn tikv_transaction2(txn: Transaction) {
    let mut txn = txn;
    println!("txn2 start Lock");
    match txn.lock_keys(vec!["tk2".to_owned()]).await{
        Ok(_) => {}
        Err(err) => {
            println!("txn2 lock key error: {}", err.to_string());
            txn.rollback().await.unwrap();
            return;
        }
    }
    println!("txn2 start put");
    match txn.put("tk2".to_owned(), "tv2".to_owned()).await {
        Ok(_) => {}
        Err(err) => {
            println!("txn2 {:}", err.to_string());
            txn.rollback().await.unwrap();
            return;
        }
    }
    sleep(1000).await;
    txn.commit().await.unwrap();
    println!("txn2 Commit");
}

pub fn get_transaction_option() -> TransactionOptions {
    let opts = TransactionOptions::new_pessimistic();
    let mut retry_opts = RetryOptions::default_pessimistic();
    retry_opts.lock_backoff = Backoff::full_jitter_backoff(2, 500, 1000);
    retry_opts.region_backoff = Backoff::full_jitter_backoff(2, 500, 1000);
    opts.drop_check(CheckLevel::Warn)
        .use_async_commit()
        .try_one_pc()
        .retry_options(retry_opts)
}

#[tokio::main]
async fn main() {
    let runtime = Runtime::new().unwrap();
    let pd_addrs = vec![String::from("127.0.0.1:2379")];
    let conn = TransactionClient::new(pd_addrs.clone()).await.unwrap();
    let conn2 = TransactionClient::new(pd_addrs.clone()).await.unwrap();
    let txn1 = conn.begin_with_options(get_transaction_option()).await.unwrap();
    runtime.spawn(async move {
        // tikv_test().await;
        tikv_transaction1(txn1).await;
    });
    sleep(1000).await;
    let txn2 = conn2.begin_with_options(get_transaction_option()).await.unwrap();
    runtime.spawn(async move {
        tikv_transaction2(txn2).await;
    });
    sleep(10000).await;
    println!("Finished");
    runtime.shutdown_background();
}

As expected txn2's lock_keys should be wait until txn1 commit. But the execute result is:

txn1 start Lock
txn1 start put
txn2 start Lock
txn2 start put
txn2 Commit
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: KeyError(KeyError { locked: None, retryable: "", abort: "Error(Txn(Error(Mvcc(Error(PessimisticLockNotFound { start_ts: TimeStamp(430570303956975617), key: [116, 107, 50] })))))", conflict: None, already_exist: None, deadlock: None, commit_ts_expired: None, txn_not_found: None, commit_ts_too_large: None })', src/main.rs:37:24
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: core::result::Result<T,E>::unwrap
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/core/src/result.rs:1299:23
   4: tikvclienttest::tikv_transaction1::{{closure}}
             at /Users/rain/work/tikvclienttest/src/main.rs:37:5
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/core/src/future/mod.rs:80:19
   6: tikvclienttest::main::{{closure}}::{{closure}}
             at /Users/rain/work/tikvclienttest/src/main.rs:86:9
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/core/src/future/mod.rs:80:19
   8: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/core.rs:161:17
   9: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/loom/std/unsafe_cell.rs:14:9
  10: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/core.rs:151:13
  11: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:461:19
  12: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/core/src/panic/unwind_safe.rs:271:9
  13: std::panicking::try::do_call
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:403:40
  14: <unknown>
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:431:6
  15: std::panicking::try
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:367:19
  16: std::panic::catch_unwind
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panic.rs:129:14
  17: tokio::runtime::task::harness::poll_future
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:449:18
  18: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:98:27
  19: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:53:15
  20: tokio::runtime::task::raw::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/raw.rs:113:5
  21: tokio::runtime::task::raw::RawTask::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/raw.rs:70:18
  22: tokio::runtime::task::LocalNotified<S>::run
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/mod.rs:343:9
  23: tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:420:13
  24: tokio::coop::with_budget::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/coop.rs:102:9
  25: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/thread/local.rs:399:16
  26: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/thread/local.rs:375:9
  27: tokio::coop::with_budget
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/coop.rs:95:5
  28: tokio::coop::budget
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/coop.rs:72:5
  29: tokio::runtime::thread_pool::worker::Context::run_task
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:419:9
  30: tokio::runtime::thread_pool::worker::Context::run
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:386:24
  31: tokio::runtime::thread_pool::worker::run::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:371:17
  32: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/macros/scoped_tls.rs:61:9
  33: tokio::runtime::thread_pool::worker::run
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:368:5
  34: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/thread_pool/worker.rs:347:45
  35: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/blocking/task.rs:42:21
  36: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/core.rs:161:17
  37: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/loom/std/unsafe_cell.rs:14:9
  38: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/core.rs:151:13
  39: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:461:19
  40: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/core/src/panic/unwind_safe.rs:271:9
  41: std::panicking::try::do_call
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:403:40
  42: <unknown>
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:438:2
  43: std::panicking::try
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panicking.rs:367:19
  44: std::panic::catch_unwind
             at /private/tmp/rust-20211101-73771-1w3cv5q/rustc-1.56.1-src/library/std/src/panic.rs:129:14
  45: tokio::runtime::task::harness::poll_future
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:449:18
  46: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:98:27
  47: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/harness.rs:53:15
  48: tokio::runtime::task::raw::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/raw.rs:113:5
  49: tokio::runtime::task::raw::RawTask::poll
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/raw.rs:70:18
  50: tokio::runtime::task::UnownedTask<S>::run
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/task/mod.rs:379:9
  51: tokio::runtime::blocking::pool::Inner::run
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/blocking/pool.rs:264:17
  52: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /Users/rain/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/blocking/pool.rs:244:17
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Finished

As we can see txn1's commit is failed.

This is unexpected, thanks for finding the issue!
The problem is that the pessimistic lock requests use the default TTL (3s, which is for prewrites) instead of the MAX_TTL(20s) so that the heartbeat mechanism isn't able to keep the locks alive.

@ekexium I test your PR in my test project, I found if I change the txn1's sleep time to 30 seconds, txn2 still can get the lock.
As we expected, txn2 should not get the lock before txn1 commit. txn2 lock_keys return an Error is acceptable.

@blacktear23 Did you use the 0.1 version or master? 0.1 doesn't include the heartbeat feature (its support for pessimistic transactions is incomplete).

your pr branch #329

your pr branch #329

emmm I can't reproduce it. Did you just change the sleep time from 5 secs to 30?
Btw what sleep are you using?

yes just change 5sec to 30sec. The sleep function is using tokio::time::sleep

@ekexium https://github.com/blacktear23/tikv-client-test/blob/master/src/main.rs this is my test code repo.

If you build it with debug and execute the binary, you can get the result below:

txn1 start Lock
Jan 19 20:26:55.217 INFO connect to tikv endpoint: "127.0.0.1:20160"
txn1 start put
txn2 start Lock
Jan 19 20:26:56.223 INFO connect to tikv endpoint: "127.0.0.1:20160"
txn2 start put
txn2 Commit
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: MultipleKeyErrors([KeyError(KeyError { locked: None, retryable: "", abort: "Error(Txn(Error(Mvcc(Error(PessimisticLockNotFound { start_ts: TimeStamp(430596480093650945), key: [116, 107, 50] })))))", conflict: None, already_exist: None, deadlock: None, commit_ts_expired: None, txn_not_found: None, commit_ts_too_large: None })])', src/main.rs:37:24
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Finished

@blacktear23 I see. The new TTL is not updated correctly. I've fixed that in the same PR. Please check if it works.

@ekexium after update your branch it works!