Sanitizing Span Tags

Question

Sanitizing Span Tags

vprithvi opened this issue 7 years ago · 7 comments

Opentracing has some well defined span tags with well defined semantics, which can be set by instrumentation automatically.

An example of this is the HTTP_URL tag, which is usually set to the entire http(s) url a call is being made to.

A service making a https call to an external API which uses query parameter based authentication might leak secrets into the tracing system by virtue of helpful instrumentation.

There are multiple points where sensitive data can be sanitized

At the instrumentation
At the tracer
At some infrastructure level (like collectors, etc)

As a service owner, I have a clear idea of what data is sensitive, and I might have a strong inclination for the data to not leave the service, which rules out option 3.

Between 1 and 2, what would be a reasonable approach?
Further, where should such a SpanSanitizer interface be defined?

Answer 1 · 2017-09-01T01:59:12.000Z

@vprithvi This is a great discussion. As you are an Uber member, have you dicussed with @yurishkuro and Jaeger team?

First, I agree with you about Data should not lease the service, if they are sensitive.

Now, this is what I known, in commercial products, they usually do auto instrument in VM-style(like JVM) platform, the intrumentation and tracer are same for them, so they have some strategies in their agents, and clear the data.

And, when you are using OpenTracing API, you definitely do manual instrumentations. So I prefer Option-1, At the instrumentation. Why? Most OT compatible tracers are open source, and don't concern about data sensitive. Also the tag value is based on your code, you can do anything you want to make sure that there is no leak. No need to trust the tracer implementation or OpenTracing API to do so.

Even we provide a SpanSanitizer, as you proposal, the implementation of this interface belongs to the tracer implementation, rather than you. So, you also can't make sure what is going on, when you set an sensitive tag value, right?

You own the data, you make them good enough to trace. :)

Answer 2 · 2017-09-01T08:13:33.000Z

Filtering out sensitive data highly depends on the monitored app/deployment. Different apps might be concerned with different sensitive data so it makes it hard to configure it "globally".

All instrumentations what we have written support span decorators, so if you would like to filter-out some sensitive data there is a way to do so, however, if your app uses several instrumentations this might be inconvenient to do it on the instrumentation layer.

Answer 3 · 2017-09-01T08:58:57.000Z

Another option may be something like the Filter discussed (but not yet implemented) within API extensions contrib opentracing-contrib/java-api-extensions#4. This would be independent of the specific way in which the app is instrumented - but allow a consistent approach to sanitizing the data being provided to the tracer. Also this is for Java only currently, but would be good to have equivalent API extension contribs for the other supported languages. Regards Gary

…

On Fri, Sep 1, 2017 at 9:13 AM, Pavol Loffay ***@***.***> wrote: Filtering out sensitive data highly depends on the monitored app/deployment. Different apps might be concerned with different sensitive data so it makes it hard to configure it "globally". All instrumentations what we have written support span decorators, so if you would like to filter-out some sensitive data there is a way to do so, however, if your app uses several instrumentations this might be inconvenient to do it on the instrumentation layer. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#85 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKC0snnf9EiKkUSoIyxISfQePLRjNL_ks5sd7yugaJpZM4PJdFv> .

Answer 4 · 2017-09-01T09:02:40.000Z

Another option may be something like the Filter discussed (but not yet
implemented) within API extensions contrib

@objectiser I think the Filter may be not a good way, even we have implemented. As a filter implementation, need to compare tag.key every time to set the value, in order to decide it is a sensitive key or not, even for irrelative span.

Answer 5 · 2017-09-01T16:07:34.000Z

@wu-sheng Indeed, it was @yurishkuro who suggested discussing this here!

I agree with @wu-sheng and @pavolloffay in that the service owner is the best judge of what sensitive information. As such, they should be the ones implementing a SpanSanitizer or some such.

@objectiser I like having an interceptor or filter API, and it being applied by the Tracer because it is a common place where this logic resides.

@wu-sheng There are inefficiencies when implementing this on instrumentations as well. For e.g., Jaeger's tracer samples spans, for unsampled spans, the store operations simply short circuit as a noop. For high traffic applications with low sampling rates, applying this filter at the instrumentation level could be pretty expensive.

Answer 6 · 2017-09-04T03:13:43.000Z

For high traffic applications with low sampling rates

@vprithvi What do you want to do about keeping the secret info safe? I think the secret info is Tag-value related, or do you have others? If so,I prefer don't tag the secret value. In order to do so, I prefer to provide an API about the span is sampled or not, and users can decide to tag or not. This is more helpful about improving the performance.

If provide SpanSanitizer, the implementation also have to match and callback. The match is cost for sampled data. So with high samping rates, there is too much cost. ;P I think these are two sides of coin.

Answer 7 · 2017-09-13T21:51:19.000Z

I too would like to see a standardized way of post-processing, but I think it should extend do logs as well, not just tags.