Panic from `dipstick` (seemingly during flushing of metrics)

Question

Panic from `dipstick` (seemingly during flushing of metrics)

idubrov opened this issue 6 years ago · 3 comments

I got a panic coming from the dipstick code. I wasn't able to reproduce it again, though. Happened once so far.

thread '<unnamed>' panicked at 'attempt to subtract with overflow', /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/scores.rs:100:36
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:380
   3: std::panicking::default_hook
             at libstd/panicking.rs:396
   4: std::panicking::begin_panic
             at libstd/panicking.rs:576
   5: std::panicking::begin_panic
             at libstd/panicking.rs:537
   6: std::panicking::try::do_call
             at libstd/panicking.rs:521
   7: std::panicking::try::do_call
             at libstd/panicking.rs:497
   8: <core::ops::range::Range<Idx> as core::fmt::Debug>::fmt
             at libcore/panicking.rs:71
   9: <core::ops::range::Range<Idx> as core::fmt::Debug>::fmt
             at libcore/panicking.rs:51
  10: dipstick::scores::Scoreboard::snapshot
             at /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/scores.rs:100
  11: dipstick::aggregate::aggregate::{{closure}}::{{closure}}::{{closure}}
             at /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/aggregate.rs:50
  12: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &'a mut F>::call_once
             at /Users/travis/build/rust-lang/rust/src/libcore/ops/function.rs:271
  13: <core::option::Option<T>>::map
             at /Users/travis/build/rust-lang/rust/src/libcore/option.rs:404
  14: <core::iter::FlatMap<I, U, F> as core::iter::iterator::Iterator>::next
             at /Users/travis/build/rust-lang/rust/src/libcore/iter/mod.rs:2446
  15: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter
             at /Users/travis/build/rust-lang/rust/src/liballoc/vec.rs:1801
  16: <alloc::vec::Vec<T> as core::iter::traits::FromIterator<T>>::from_iter
             at /Users/travis/build/rust-lang/rust/src/liballoc/vec.rs:1713
  17: core::iter::iterator::Iterator::collect
             at /Users/travis/build/rust-lang/rust/src/libcore/iter/iterator.rs:1303
  18: dipstick::aggregate::aggregate::{{closure}}::{{closure}}
             at /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/aggregate.rs:50
  19: <dipstick::core::ControlScopeFn<M>>::flush
             at /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/core.rs:136
  20: <dipstick::core::ControlScopeFn<M> as core::ops::drop::Drop>::drop
             at /Users/idubrov/.cargo/registry/src/github.com-1ecc6299db9ec823/dipstick-0.6.5/src/core.rs:155
  21: core::ptr::drop_in_place
             at /Users/travis/build/rust-lang/rust/src/libcore/ptr.rs:59
  22: core::ptr::drop_in_place
             at /Users/travis/build/rust-lang/rust/src/libcore/ptr.rs:59

It looks to me that the cause is that now captured from self.scores[0] around scores.rs:100 somehow got bigger than now captured by calling time::precise_time_ns().

Answer 1 · 2018-05-05T05:59:54.000Z

Oh gosh. Sorry for not getting back to you quicker on this, I try to be proactive but your issue somehow slipped through the cracks. Did it happen again?

The time functions have been updated in version 0.6.6. The chrono crate is now used to get more accurate publication timings. There were some situations were time wouldn't update, even between distant calls, resulting in strange situations, such as extremely high rates being computed.

Thanks for reporting it, let me know how it goes.

Answer 2 · 2018-05-21T18:17:31.000Z

I was still getting this issue on 0.6.8. Upgraded to 0.6.10, seems like the issue went away.

Thanks!

Answer 3 · 2018-05-21T21:50:25.000Z

Yes, 0.6.10 has significant improvements on time measurement. Rather than using time or chrono methods, timers now simply use std::time::Instant, which are just as fast as before but are more reliable, and reduces the crate dependencies.

Also, the measurement of the metric collection period (used to compute the rates) is now done only once per aggregate publication operation, instead of being done for every metric as before, which was kind of useless.