getsentry/sentry-python

'User' object has no attribute 'is_verified'

Opened this issue ยท 17 comments

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

Versions:

  • Django: 4.2.15
  • uwsgi: 2.0.24
  • sentry-sdk: 2.13.0
  • django-otp: 1.5.1
  1. Setup django-otp: https://django-otp-official.readthedocs.io/en/stable/overview.html#installation

  2. Use django_otp.admin.OTPAdminSite as custom admin site.

    # urls.py
    from django_otp.admin import OTPAdminSite
    
    admin_site = OTPAdminSite()
    urlpatterns = [path("admin/, admin_site.urls),]
  3. Enable profiling by setting profiles_sample_rate in sentry_sdk.init.

  4. When the uwsgi server starts, the first request to the admin fails with the error:

    'User' object has no attribute 'is_verified'

  5. If you remove the profiles_sample_rate from sentry_sdk.init, the error goes away.

Expected Result

Profiling should not change the behaviour of middleware so I would expect django_otp.middleware.OTPMiddleware to set the is_verified attribute on the User object successfully.

Actual Result

django_otp.middleware.OTPMiddleware somehow cannot set User.is_verified for the first request in the admin. Consequent calls works fine, but only the initial request to admin fails.

Product Area

Profiling

Link

No response

DSN

No response

Version

No response

Assigning to @getsentry/support for routing โฒ๏ธ

@omerfarukabaci, I am having trouble reproducing your issue from the information you have provided. Could you please provide your source code (if it is public), or create a minimal reproduction that you can share with me?

Hey @szokeasaurusrex, thank you for your quick answer! Unfortunately our source code is not public. I have tried to create a minimal reproduction but couldn't manage to do that, since the issue is flaky and does not happen all the time. But I have some events on Sentry, would that work? In the traces of those events, it can be seen that OTPMiddleware has been called but is_verified attribute is not set.

Hey @omerfarukabaci, any additional info would help, even just the full stack trace. Sentry events too -- feel free to put the links to them here or if you prefer, send them to me on Discord (sentrivana) or to my email (ivana.kellyer @ sentry.io). Thanks!

Hey @sentrivana, I have sent Sentry event links to your email. There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.

If you need further assistance on my side please let me know, I can override the OTPAdminSite and retrieve extra information in case of this exception. I can also provide uwsgi configuration file if that helps.

Thanks a lot for your prompt responses! ๐Ÿš€

Thanks a lot for your prompt responses! ๐Ÿš€

Sure thing, that's why we're here :) Thanks for the quick follow ups!

I have sent Sentry event links to your email.

So I've taken a look at the Sentry events and while the stacktraces provide some extra context, I'm still not sure what's going on, especially why profiling specifically would be causing this.

The facts we have so far:

  • django-otp patches the User object here, adding is_verified on it
  • Django tries to read is_verified here with _wrapped being a User instance and fails, indicating the middleware has not run or the user object was somehow overwritten after the fact
  • The error only happens on the first request
  • Turning off profiling gets rid of the issue
    • The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the django-otp patch
    • The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
  • Setting the profiler aside, our Django integration does an extra call to User.is_authenticated here. I'm wondering if this has anything to do with the issue.

Some follow-ups:

  • Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?
  • Could you show me your sentry_sdk.init?

There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.

I had a quick look but don't have many insights -- if you keep seeing this and it indeed looks like a profiler/SDK issue, can you please open a new issue just with that? Makes it easier for us to keep track. :)

@sentrivana Thanks for the detailed response!

Some comments about the facts:

The error only happens on the first request

Now I have realized that this is not the case, sorry for the initial misinformation. To test this I have opened up our QA and Sandbox admin pages and refreshed them tens of times, and eventually I could see the same error again, so it doesn't only happen on the first request but happens occassionally.

Turning off profiling gets rid of the issue

I have checked the deployments just before encountering this issue for the first time, and the issue is introduced after a deployment that includes:

  • Upgrading sentry-sdk from 1.40.6 to 2.7.1
  • Setting profiles_sample_rate=0.3

Then the issue disappeared after I have removed profiles_sample_rate along with some other changes:

  • Setting py-call-uwsgi-fork-hooks=true in uwsgi conf (enable-threads was already true).
  • Upgrading sentry-sdk from 2.7.1 to 2.13.0

Then one week after not seeing the issue, I have enabled profiling again, then immediately we have started seeing this issue again, which is why I strongly suspect that this is related with profiling.

The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)

Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:

  • 9c4203a666d24a848f1980f236a48671
  • 4fd9a500f7a749109e9fbaea88e42232

You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?

Follow-ups

Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?

Yes, copy & pasting from the settings source code:

INSTALLED_APPS = [
    ...,
    "django_otp",
    "django_otp.plugins.otp_static",
    "django_otp.plugins.otp_totp",
]

MIDDLEWARE = [
    m
    for m in [
        ...
        ("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
    ]
    if m
]

I thought that maybe we somehow cannot set or change the ROLE but you can see that OTPMiddleware is being called in the trace, so it shouldn't be the case.

Could you show me your sentry_sdk.init

Copy & pasting the whole sentry config from the settings source code:

import os
import re

SENTRY_DSN = os.environ.get("SENTRY_DSN")
SENTRY_ENV = os.environ.get("SENTRY_ENV")

if SENTRY_DSN:
    if not SENTRY_ENV:
        raise RuntimeError(
            "SENTRY_ENV env var needs to be defined (use role name, such as 'admin' or 'buyer_console')"
        )

    import sentry_sdk
    from sentry_sdk.integrations.django import DjangoIntegration

    def traces_sampler(sampling_context):
        path = sampling_context.get("wsgi_environ", {}).get("PATH_INFO")
        method = sampling_context.get("wsgi_environ", {}).get("REQUEST_METHOD")
        if path == "/favicon.ico":
            return 0
        if path.startswith("/static/"):
            return 0

        if path == "/health/":
            return 0.001

        if re.match(r"/v1\/api\/some-path\/\S+\/$", path) and method == "GET":
            return 0.001
        if path == "/v1/api/token/":
            return 0.001

        return 1

    sentry_sdk.init(
        dsn=SENTRY_DSN,
        integrations=[DjangoIntegration()],
        send_default_pii=True,
        traces_sampler=traces_sampler,
        profiles_sample_rate=1.0,
        environment=SENTRY_ENV,
    )

thanks @omerfarukabaci, just to rule another possibility out first, you are setting the Authentication middleware like this too before the OTP one right?
https://stackoverflow.com/a/77109153

# "settings.py"

MIDDLEWARE = [
    ...
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django_otp.middleware.OTPMiddleware' # Here
]

Heyt @sl0thentr0py, you're welcome! Yes, it is set as following:

MIDDLEWARE = [
        ...
        "django.contrib.auth.middleware.AuthenticationMiddleware",
        ("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]

Thanks for helping us investigate, this is very much appreciated! โค

Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:

9c4203a666d24a848f1980f236a48671
4fd9a500f7a749109e9fbaea88e42232

You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?

This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False for a bit and enabling the profiler. Setting send_default_pii to False should disable the code path that leads to us capturing request.user. Could you try that out and let us know if the error still happens?

Thanks for helping us investigate, this is very much appreciated! โค

Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:

9c4203a666d24a848f1980f236a48671
4fd9a500f7a749109e9fbaea88e42232

You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?

This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False for a bit and enabling the profiler. Setting send_default_pii to False should disable the code path that leads to us capturing request.user. Could you try that out and let us know if the error still happens?

You're welcome, it is my pleasure, and also thank you for your concern! โค๏ธ Unfortunately I am on vacation for 3 weeks, I will try that and let you know when I am back. ๐Ÿš€

Hey @sentrivana, @sl0thentr0py

I am finally back. ๐Ÿ™‚ My colleagues changed send_default_pii to False, but unfortunately the issue still persists. Anything else I can do to investigate this further?

Hey @omerfarukabaci, welcome back ๐Ÿ‘‹๐Ÿป

So much for my hypothesis ๐Ÿ˜ข So it seems like it doesn't have anything to do with the SDK accessing the user data (at least not the _set_user_info part) since that code path shouldn't be triggered at all if PII sending is False.

Based on how inconsistently it happens it indeed looks like a possible race condition. I understand that it's super fickle to repro but I don't have any ideas how to debug this further without having an example app that runs into the issue.

Hey @sentrivana, thank you! ๐Ÿ‘‹๐Ÿผ

Since this issue happens everyday on our QA environment, we are actually able to reproduce it. If you could direct me about what else you need to know I can update the code on the QA to gather more information. I can also fork SDK to manipulate its source code to learn more about the problem. Trying to reproduce it in a separate app is also possible, but it requires more work since we don't have much idea about the actual reason.

So the problem as far as I can tell is either that request.user is modified/overwritten after the OTP middleware has patched it, or the middleware doesn't even run (not that likely).

What we could look at, during a request when the error happens:

  • Can we verify that the django-otp middleware has actually run and patched the user?
  • Can we pinpoint when the user object has changed, or at least narrow down points in time when it looks correct vs. when it's already wrong? (E.g. check what it looks like in the profiler before and after it's taken a sample since this is our most likely culprit at the moment.)

What would also be interesting would be comparing this to a working request.

Hey @sentrivana,

What do you think about the following middleware that monkey-patches request object's getter and setter, and logs the traceback if user attribute has been changed or accessed:

import logging
import traceback

LOGGER = logging.getLogger(__name__)


class CustomRequestMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def log_traceback_for_user_action(self, action, attr):
        if attr == "user":
            LOGGER.info(f"[NDV-1622] request.user is being {action} by:")
            for frame in traceback.extract_stack():
                LOGGER.info(f"\t[NDV-1622] {frame.filename}:{frame.lineno} {frame.name}")

    def __call__(self, request):
        def custom_setattr(obj, attr, value):
            self.log_traceback_for_user_action("set", attr)
            return object.__setattr__(obj, attr, value)

        def custom_getattr(obj, attr):
            self.log_traceback_for_user_action("get", attr)
            return object.__getattribute__(obj, attr)

        request.__class__.__setattr__ = custom_setattr
        request.__class__.__getattribute__ = custom_getattr

        return self.get_response(request)

I am planning to add this as the first middleware, and deploy to QA and try to reproduce the issue. Then I will look for any sign of sentry-sdk. I know it is a bit tricky but I think that could work, also I am kinda out of options.

@omerfarukabaci Nice! Yeah I think that could give us a lot of insight, let's try it.