opentracing/opentracing-python

We need to define consistent semantics of passing collections to tracer methods

yurishkuro opened this issue · 4 comments

We can either emphasize safety by requiring tracer to always take copies of passed collections if it wants to store them, or emphasize performance by telling the caller that it must always give up ownership of collections.

My vote is to emphasize performance. The safety can be achieved by decorators.

TL;DR; my 2p is (benchmark, )make a call, and document it.

Documenting how to mitigate untrusted input (ex via decorators) is a choice we can make, regardless of how unpopular that would be for safety/security minds.

I'd encourage benchmarks to show the impact, since if we are successful, we will have someone ask about a vulnerability sooner or later. We may not be able to "afford" this rigor quite yet, and I also have no idea what tools are available in python.

-- non-python asides which may or may not help

Immutable collections are often compared so as to not be copied. You see this a lot in google code. Librarians working at google ensure common libraries can be used safely and performantly, often with Caliper (and more recently JMH) to prove they perform.

At twitter, even equals had to be written in a way that wouldn't leak timing information.

At square/android, hefty disclaimers were added when untrusted code was allowed access to internal data structures. I've personally had to spend days on something similar due to a vulnerability report on a seemingly reasonable library design decision.

bhs commented

Re the actual subject of this issue, I am fine with either option as long as we document the choice. My assumption as a library user is that the param would be passed by reference (i.e., ownership hands off to the library), FWIW.

Re the thing I commented about in that PR: I'm not aiming for parity with golang, I'm aiming for supporting the minimum reasonable number of ways to do the same thing. I.e., if we support add_tags, then we don't need that optional param.

I agree that this is less of a concern in python due to optional params and method overloading, though.

@adriancole on the benchmarks, this is something I am asked often internally as well, and don't have a good story to tell. Because raw overhead of a tracer that does all the work but never reports collected spans is orders of magnitude less than if it reports every span. What's the best way to measure overhead then?

------------------------------------------------------------------------------------------------- benchmark: 7 tests ------------------------------------------------------------------------------------------------
Name (time in us)                        Min                       Max                      Mean                  StdDev                    Median                     IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_noop_tracer                    294.9238 (1.0)            766.0389 (1.0)            332.3823 (1.0)           60.6353 (1.0)            304.9374 (1.0)           43.8690 (1.0)          243;219    2899           1
test_no_sampling                  6,337.1658 (21.49)       10,120.8687 (13.21)        6,880.9415 (20.70)        826.9345 (13.64)        6,532.9075 (21.42)        329.9713 (7.52)           17;24     142           1
test_100pct_sampling             29,278.9936 (99.28)       37,117.9581 (48.45)       32,018.7026 (96.33)      2,163.7009 (35.68)       32,636.8809 (107.03)     3,891.2892 (88.70)           11;0      29           1
test_100pct_sampling_250mcs     285,115.0036 (966.74)     288,971.9009 (377.23)     286,231.7085 (861.15)     1,587.1203 (26.17)      285,848.8560 (937.40)     1,638.3529 (37.35)            1;0       5           1
test_all_batched_size10         965,931.1771 (>1000.0)  1,476,147.8901 (>1000.0)  1,227,247.3812 (>1000.0)  186,482.0533 (>1000.0)  1,251,310.8253 (>1000.0)  223,704.7553 (>1000.0)          2;0       5           1
test_all_batched_size5          978,775.9781 (>1000.0)  1,469,427.1088 (>1000.0)  1,235,030.6034 (>1000.0)  178,309.5332 (>1000.0)  1,251,420.0211 (>1000.0)  206,752.7175 (>1000.0)          2;0       5           1
test_all_not_batched            983,170.9862 (>1000.0)  1,912,110.0903 (>1000.0)  1,413,696.1937 (>1000.0)  431,268.8649 (>1000.0)  1,236,485.0044 (>1000.0)  786,244.1540 (>1000.0)          1;0       5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

good q. I get that from an end-to-end pov, the traces sent/timeunit is for many the key metric and a benchmark. It looks like you've that above.

Wrt to things like api design, it would be a microbench that can measure micros or nanos w/o side effects. So here it would clearly show the difference in choices like this.

The first bench (like you have above) is more important than the latter, imho, as if a design decision for performance doesn't move the needle, it sortof proves itself a micro-optimization (which would be testable in a microbenchmark)