Spotlight's 100% tracer sampler interferes with health checks
Closed this issue · 6 comments
How do you use Sentry?
Self-hosted/on-premise
Version
2.35.0
Steps to Reproduce
I'm writing this in full awareness of XKCD 1172...
In my environment, we are developing our services using a local Kubernetes cluster. The deployment configuration is as close to production deployment as it can be, so that we employ Kubernetes health checks on our services. We are using logging filters and Sentry samplers to filter out requests going to /health from logs and traces.
Now we want to expand our Sentry experience with Spotlight, Apparently, in #4207 a change has been introduced to sample at 100% for everything with Spotlight in development. Maybe our configuration is unadjusted, but in our case, this leads to /health traces overwhelming the Spotlight overlay and also the logs if set to DEBUG.
- Add a primitive endpoint to your application
- Add a tracing sampler to your Sentry SDK configuration filtering that endpoint
- Enable
debugon your Sentry SDK - Add Spotlight to the configuration
- Call the endpoint multiple times and observe both logs and overlay getting immer fuller
Expected Result
I expect Spotlight to work exactly like Sentry would, i.e. respecting sampler configuration. It's perfectly possible my expectation is wrong, I'm ready to add code to my services to make it work like I expect it to. From what I can see, it's currently not possible. For example, setting a DSN results in SDK trying to parse it or trying to connect to it, so that a placeholder DSN won't work and I don't want to use a real DSN in development. Leaving the DSN out results in the described behaviour.
Actual Result
Logs and overlay overflow with useless data.
Hi @rassie, thanks a lot for the detailed issue and also using Spotlight :)
I think Spotlight getting overwhelmed should be a separate issue over its own repo: getsentry/spotlight#912 -- feel free to add more details and follow there.
Regarding the sampling rate override, your assumption about Spotlight behaving exactly like Sentry unfortunately does not hold :) Sampling rate only makes sense when you deploy your app in a distributed fashion (in terms of users not necessarily multiple nodes). Locally, if you set to 1% sample rate you'd only get a random 1% of your local transactions which is very unlikely to be helpful. I can offer you 2 workarounds in the meantime:
- Set your DSN to
http://spotlight@localhost:8969/0and don't usespotlight=true. This is an undocumented hack, ref getsentry/spotlight#475 - Add a
before_send_transactionhook and filter out the healthcheck: https://docs.sentry.io/platforms/python/configuration/filtering/#using-before-send-transaction (the example there is also about excluting health checks)
Not closing the issue yet as I'm still open to hearing about arguments against turning up the traces sampling rate automatically when Spotlight is turned on and no DSN exists.
One argument we can make is to not override traces_sampler if it is already set as this can be prod/debug aware and remove the need of before_send_transaction. I can look into that but it would "leak" some logic in our own Sentry setup for instance.
Sampling rate only makes sense when you deploy your app in a distributed fashion (in terms of users not necessarily multiple nodes). Locally, if you set to 1% sample rate you'd only get a random 1% of your local transactions which is very unlikely to be helpful.
I think we can mostly agree on this -- I'm not really using sample rates between 0.0 and 1.0, it's more of an ON/OFF switch for me, OFF for healthchecks, ON for everything else. I'll look into using before_send_transaction, could be a good solution, I don't really care where I filter. In general there might be a middle ground solution like making traces_sampler behave identically in every environment and just clamp the returned values in DEV to 0.0 or 1.0.
@rassie just following up on this to make sure you are not waiting on us to do anything (or if you do, clarify the next step 🙂 )
I'll look into using before_send_transaction, could be a good solution, I don't really care where I filter. In general there might be a middle ground solution like making traces_sampler behave identically in every environment and just clamp the returned values in DEV to 0.0 or 1.0.
Were you able to use before_send_transaction and if yes, was that a good experience?
I'll close this since there's been no response for some time and there's a dedicated Spotlight issue now -- please follow up there.