'User' object has no attribute 'is_verified'

Question

'User' object has no attribute 'is_verified'

Opened this issue a month ago · 17 comments

omerfarukabaci commented a month ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

Versions:

Django: 4.2.15
uwsgi: 2.0.24
sentry-sdk: 2.13.0
django-otp: 1.5.1

Setup django-otp: https://django-otp-official.readthedocs.io/en/stable/overview.html#installation

Use django_otp.admin.OTPAdminSite as custom admin site.

# urls.py
from django_otp.admin import OTPAdminSite

admin_site = OTPAdminSite()
urlpatterns = [path("admin/, admin_site.urls),]

Enable profiling by setting profiles_sample_rate in sentry_sdk.init.
When the uwsgi server starts, the first request to the admin fails with the error:

'User' object has no attribute 'is_verified'
If you remove the profiles_sample_rate from sentry_sdk.init, the error goes away.

Expected Result

Profiling should not change the behaviour of middleware so I would expect django_otp.middleware.OTPMiddleware to set the is_verified attribute on the User object successfully.

Actual Result

django_otp.middleware.OTPMiddleware somehow cannot set User.is_verified for the first request in the admin. Consequent calls works fine, but only the initial request to admin fails.

Product Area

Profiling

Link

No response

DSN

No response

Version

No response

Answer 1 · 2024-08-26T09:19:07.000Z

Assigning to @getsentry/support for routing ⏲️

Answer 2 · 2024-08-26T13:49:17.000Z

@omerfarukabaci, I am having trouble reproducing your issue from the information you have provided. Could you please provide your source code (if it is public), or create a minimal reproduction that you can share with me?

Answer 3 · 2024-08-27T08:52:30.000Z

Hey @szokeasaurusrex, thank you for your quick answer! Unfortunately our source code is not public. I have tried to create a minimal reproduction but couldn't manage to do that, since the issue is flaky and does not happen all the time. But I have some events on Sentry, would that work? In the traces of those events, it can be seen that OTPMiddleware has been called but is_verified attribute is not set.

Answer 4 · 2024-08-27T10:27:31.000Z

Hey @omerfarukabaci, any additional info would help, even just the full stack trace. Sentry events too -- feel free to put the links to them here or if you prefer, send them to me on Discord (sentrivana) or to my email (ivana.kellyer @ sentry.io). Thanks!

Answer 5 · 2024-08-27T10:47:01.000Z

Hey @sentrivana, I have sent Sentry event links to your email. There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.

If you need further assistance on my side please let me know, I can override the OTPAdminSite and retrieve extra information in case of this exception. I can also provide uwsgi configuration file if that helps.

Thanks a lot for your prompt responses! 🚀

Answer 6 · 2024-08-27T13:28:24.000Z

Thanks a lot for your prompt responses! 🚀

Sure thing, that's why we're here :) Thanks for the quick follow ups!

I have sent Sentry event links to your email.

So I've taken a look at the Sentry events and while the stacktraces provide some extra context, I'm still not sure what's going on, especially why profiling specifically would be causing this.

The facts we have so far:

django-otp patches the User object here, adding is_verified on it
Django tries to read is_verified here with _wrapped being a User instance and fails, indicating the middleware has not run or the user object was somehow overwritten after the fact
The error only happens on the first request
Turning off profiling gets rid of the issue
- The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the django-otp patch
- The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
Setting the profiler aside, our Django integration does an extra call to User.is_authenticated here. I'm wondering if this has anything to do with the issue.

Some follow-ups:

Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?
Could you show me your sentry_sdk.init?

There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.

I had a quick look but don't have many insights -- if you keep seeing this and it indeed looks like a profiler/SDK issue, can you please open a new issue just with that? Makes it easier for us to keep track. :)

Answer 7 · 2024-08-27T15:07:58.000Z

@sentrivana Thanks for the detailed response!

Some comments about the facts:

The error only happens on the first request

Now I have realized that this is not the case, sorry for the initial misinformation. To test this I have opened up our QA and Sandbox admin pages and refreshed them tens of times, and eventually I could see the same error again, so it doesn't only happen on the first request but happens occassionally.

Turning off profiling gets rid of the issue

I have checked the deployments just before encountering this issue for the first time, and the issue is introduced after a deployment that includes:

Upgrading sentry-sdk from 1.40.6 to 2.7.1
Setting profiles_sample_rate=0.3

Then the issue disappeared after I have removed profiles_sample_rate along with some other changes:

Setting py-call-uwsgi-fork-hooks=true in uwsgi conf (enable-threads was already true).
Upgrading sentry-sdk from 2.7.1 to 2.13.0

Then one week after not seeing the issue, I have enabled profiling again, then immediately we have started seeing this issue again, which is why I strongly suspect that this is related with profiling.

The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)

Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:

9c4203a666d24a848f1980f236a48671
4fd9a500f7a749109e9fbaea88e42232

You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?

Follow-ups

Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?

Yes, copy & pasting from the settings source code:

INSTALLED_APPS = [
    ...,
    "django_otp",
    "django_otp.plugins.otp_static",
    "django_otp.plugins.otp_totp",
]

MIDDLEWARE = [
    m
    for m in [
        ...
        ("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
    ]
    if m
]

I thought that maybe we somehow cannot set or change the ROLE but you can see that OTPMiddleware is being called in the trace, so it shouldn't be the case.

Could you show me your sentry_sdk.init

Copy & pasting the whole sentry config from the settings source code:

import os
import re

SENTRY_DSN = os.environ.get("SENTRY_DSN")
SENTRY_ENV = os.environ.get("SENTRY_ENV")

if SENTRY_DSN:
    if not SENTRY_ENV:
        raise RuntimeError(
            "SENTRY_ENV env var needs to be defined (use role name, such as 'admin' or 'buyer_console')"
        )

    import sentry_sdk
    from sentry_sdk.integrations.django import DjangoIntegration

    def traces_sampler(sampling_context):
        path = sampling_context.get("wsgi_environ", {}).get("PATH_INFO")
        method = sampling_context.get("wsgi_environ", {}).get("REQUEST_METHOD")
        if path == "/favicon.ico":
            return 0
        if path.startswith("/static/"):
            return 0

        if path == "/health/":
            return 0.001

        if re.match(r"/v1\/api\/some-path\/\S+\/$", path) and method == "GET":
            return 0.001
        if path == "/v1/api/token/":
            return 0.001

        return 1

    sentry_sdk.init(
        dsn=SENTRY_DSN,
        integrations=[DjangoIntegration()],
        send_default_pii=True,
        traces_sampler=traces_sampler,
        profiles_sample_rate=1.0,
        environment=SENTRY_ENV,
    )

Answer 8 · 2024-08-28T14:51:36.000Z

thanks @omerfarukabaci, just to rule another possibility out first, you are setting the Authentication middleware like this too before the OTP one right?
https://stackoverflow.com/a/77109153

# "settings.py"

MIDDLEWARE = [
    ...
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django_otp.middleware.OTPMiddleware' # Here
]

Answer 9 · 2024-08-29T08:04:32.000Z

Heyt @sl0thentr0py, you're welcome! Yes, it is set as following:

MIDDLEWARE = [
        ...
        "django.contrib.auth.middleware.AuthenticationMiddleware",
        ("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]

Answer 10 · 2024-09-02T08:03:06.000Z

Thanks for helping us investigate, this is very much appreciated! ❤

Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671
4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?

This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False for a bit and enabling the profiler. Setting send_default_pii to False should disable the code path that leads to us capturing request.user. Could you try that out and let us know if the error still happens?

Answer 11 · 2024-09-02T08:17:30.000Z

Thanks for helping us investigate, this is very much appreciated! ❤
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671
4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False for a bit and enabling the profiler. Setting send_default_pii to False should disable the code path that leads to us capturing request.user. Could you try that out and let us know if the error still happens?

You're welcome, it is my pleasure, and also thank you for your concern! ❤️ Unfortunately I am on vacation for 3 weeks, I will try that and let you know when I am back. 🚀

Answer 12 · 2024-09-23T15:21:25.000Z

Hey @sentrivana, @sl0thentr0py

I am finally back. 🙂 My colleagues changed send_default_pii to False, but unfortunately the issue still persists. Anything else I can do to investigate this further?

Answer 13 · 2024-09-24T13:14:01.000Z

Hey @omerfarukabaci, welcome back 👋🏻

So much for my hypothesis 😢 So it seems like it doesn't have anything to do with the SDK accessing the user data (at least not the _set_user_info part) since that code path shouldn't be triggered at all if PII sending is False.

Based on how inconsistently it happens it indeed looks like a possible race condition. I understand that it's super fickle to repro but I don't have any ideas how to debug this further without having an example app that runs into the issue.

Answer 14 · 2024-09-26T11:38:17.000Z

Hey @sentrivana, thank you! 👋🏼

Since this issue happens everyday on our QA environment, we are actually able to reproduce it. If you could direct me about what else you need to know I can update the code on the QA to gather more information. I can also fork SDK to manipulate its source code to learn more about the problem. Trying to reproduce it in a separate app is also possible, but it requires more work since we don't have much idea about the actual reason.

Answer 15 · 2024-09-27T09:00:52.000Z

So the problem as far as I can tell is either that request.user is modified/overwritten after the OTP middleware has patched it, or the middleware doesn't even run (not that likely).

What we could look at, during a request when the error happens:

Can we verify that the django-otp middleware has actually run and patched the user?
Can we pinpoint when the user object has changed, or at least narrow down points in time when it looks correct vs. when it's already wrong? (E.g. check what it looks like in the profiler before and after it's taken a sample since this is our most likely culprit at the moment.)

What would also be interesting would be comparing this to a working request.

Answer 16 · 2024-10-01T16:46:27.000Z

Hey @sentrivana,

What do you think about the following middleware that monkey-patches request object's getter and setter, and logs the traceback if user attribute has been changed or accessed:

import logging
import traceback

LOGGER = logging.getLogger(__name__)


class CustomRequestMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def log_traceback_for_user_action(self, action, attr):
        if attr == "user":
            LOGGER.info(f"[NDV-1622] request.user is being {action} by:")
            for frame in traceback.extract_stack():
                LOGGER.info(f"\t[NDV-1622] {frame.filename}:{frame.lineno} {frame.name}")

    def __call__(self, request):
        def custom_setattr(obj, attr, value):
            self.log_traceback_for_user_action("set", attr)
            return object.__setattr__(obj, attr, value)

        def custom_getattr(obj, attr):
            self.log_traceback_for_user_action("get", attr)
            return object.__getattribute__(obj, attr)

        request.__class__.__setattr__ = custom_setattr
        request.__class__.__getattribute__ = custom_getattr

        return self.get_response(request)

I am planning to add this as the first middleware, and deploy to QA and try to reproduce the issue. Then I will look for any sign of sentry-sdk. I know it is a bit tricky but I think that could work, also I am kinda out of options.

Answer 17 · 2024-10-02T14:46:30.000Z

@omerfarukabaci Nice! Yeah I think that could give us a lot of insight, let's try it.