janestreet/magic-trace

Trace state tracking is broken by hardware interrupts

cgaebel opened this issue · 1 comments

Perf outputs two events magic-trace uses to construct its [untraced] spans: tr strt and tr end.

As we discovered after a deep dive today, magic-trace isn't handling these events properly and that is the cause of some staircase traces:

x

Subtleties include:

  • tr end doesn't need to be accompanied by a tr strt. For example, there's an implicit tr end during a hardware interrupt (hw int). But when tracing userspace only, you don't even see hardware interrupts.
  • Due to an Intel PT bug (?), there are sometimes two tr strts instead of one.

and they make it challenging to figure out what the correct trace state should be.

We propose the following algorithm for tracking trace state, instead of what magic-trace does today:

  • Explicitly track trace state per-thread, one of Tracing | Not_tracing.
  • Initial state is Tracing.
  • Tracing -> Not_tracing on tr end
  • Not_tracing -> Tracing on tr strt
  • On tr end while Not_tracing, print a warning and disbelieve it.
  • On tr strt while Tracing, print a warning and disbelieve it.
  • There's one exception: A tr strt is permitted as the first event of a thread, even though it's prohibited by the other rules.

This usually, but not always, happens around a call to memmove. I've left some example perf script output below:

 1139/1139  428146.916343395:   jcc                            40af03 itch_bbo::book::Book::add_order+0x3b3 =>           40b012 itch_bbo::book::Book::add_order+0x4c2
 1139/1139  428146.916343397:   call                           40b06c itch_bbo::book::Book::add_order+0x51c =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916343397:   jmp                      7ffff732924a __memmove_ssse3_back+0x2a =>     7ffff732ba10 __memmove_ssse3_back+0x27f0
 1139/1139  428146.916343398:   return                   7ffff732ba16 __memmove_ssse3_back+0x27f6 =>           40b072 itch_bbo::book::Book::add_order+0x522
 1139/1139  428146.916343398:   call                           40b093 itch_bbo::book::Book::add_order+0x543 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916343445:   tr strt                             0 [unknown] =>     7ffff732bbd0 __memmove_ssse3_back+0x29b0
 1139/1139  428146.916343561:   return                   7ffff732bbd4 __memmove_ssse3_back+0x29b4 =>           40b099 itch_bbo::book::Book::add_order+0x549
 1139/1139  428146.916343592:   jmp                            40b0a8 itch_bbo::book::Book::add_order+0x558 =>           40b1ba itch_bbo::book::Book::add_order+0x66a
 1139/1139  428146.916343592:   jmp                            40b1ce itch_bbo::book::Book::add_order+0x67e =>           40b62d itch_bbo::book::Book::add_order+0xadd
 1139/1139  428146.916323767:   jmp                            40d53e itch_bbo::main+0x9ee =>           40d540 itch_bbo::main+0x9f0
 1139/1139  428146.916323767:   jcc                            40d54e itch_bbo::main+0x9fe =>           40d63f itch_bbo::main+0xaef
 1139/1139  428146.916324004:   hw int                         40d64f itch_bbo::main+0xaff => ffffffff8ad90750 [unknown]
 1139/1139  428146.916324247:   tr strt                             0 [unknown] => ffffffff8ad90762 [unknown]
 1139/1139  428146.916324607:   tr strt                             0 [unknown] =>           40d64f itch_bbo::main+0xaff
 1139/1139  428146.916324732:   jmp                            40d679 itch_bbo::main+0xb29 =>           40d730 itch_bbo::main+0xbe0
 1139/1139  428146.916324732:   call                           40d78b itch_bbo::main+0xc3b =>           40c650 itch_bbo::maybe_sanity_check_execution+0x0
 1139/1139  428146.916294568:   call                           40a5b2 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x62 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916294568:   jcc                      7ffff7329226 __memmove_ssse3_back+0x6 =>     7ffff732924e __memmove_ssse3_back+0x2e
 1139/1139  428146.916294569:   jmp                      7ffff732926c __memmove_ssse3_back+0x4c =>     7ffff732b4b0 __memmove_ssse3_back+0x2290
 1139/1139  428146.916294569:   return                   7ffff732b4b6 __memmove_ssse3_back+0x2296 =>           40a5b5 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x65
 1139/1139  428146.916294570:   call                           40a5d6 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x86 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916294615:   tr strt                             0 [unknown] =>     7ffff732924e __memmove_ssse3_back+0x2e
 1139/1139  428146.916294690:   jmp                      7ffff732926c __memmove_ssse3_back+0x4c =>     7ffff732b5c0 __memmove_ssse3_back+0x23a0
 1139/1139  428146.916294690:   return                   7ffff732b5c8 __memmove_ssse3_back+0x23a8 =>           40a5d9 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x89
 1139/1139  428146.916294690:   jmp                            40a65c alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x10c =>           40a6c0 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x170
 1139/1139  428146.916294690:   call                           40a6c3 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x173 =>           40a1d0 alloc::collections::btree::node::BalancingContext<K,V>::merge_tracking_child_edge+0x0
 1139/1139  428146.916294707:   call                           40a297 alloc::collections::btree::node::BalancingContext<K,V>::merge_tracking_child_edge+0xc7 =>     7ffff7329220 __memmove_ssse3_back+0x0

This example was fixed by #148