'User' object has no attribute 'is_verified'
Opened this issue ยท 17 comments
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
Versions:
Django
: 4.2.15uwsgi
: 2.0.24sentry-sdk
: 2.13.0django-otp
: 1.5.1
-
Setup
django-otp
: https://django-otp-official.readthedocs.io/en/stable/overview.html#installation -
Use
django_otp.admin.OTPAdminSite
as custom admin site.# urls.py from django_otp.admin import OTPAdminSite admin_site = OTPAdminSite() urlpatterns = [path("admin/, admin_site.urls),]
-
Enable profiling by setting
profiles_sample_rate
insentry_sdk.init
. -
When the
uwsgi
server starts, the first request to the admin fails with the error:'User' object has no attribute 'is_verified'
-
If you remove the
profiles_sample_rate
fromsentry_sdk.init
, the error goes away.
Expected Result
Profiling should not change the behaviour of middleware so I would expect django_otp.middleware.OTPMiddleware
to set the is_verified
attribute on the User
object successfully.
Actual Result
django_otp.middleware.OTPMiddleware
somehow cannot set User.is_verified
for the first request in the admin. Consequent calls works fine, but only the initial request to admin fails.
Product Area
Profiling
Link
No response
DSN
No response
Version
No response
@omerfarukabaci, I am having trouble reproducing your issue from the information you have provided. Could you please provide your source code (if it is public), or create a minimal reproduction that you can share with me?
Hey @szokeasaurusrex, thank you for your quick answer! Unfortunately our source code is not public. I have tried to create a minimal reproduction but couldn't manage to do that, since the issue is flaky and does not happen all the time. But I have some events on Sentry, would that work? In the traces of those events, it can be seen that OTPMiddleware
has been called but is_verified
attribute is not set.
Hey @omerfarukabaci, any additional info would help, even just the full stack trace. Sentry events too -- feel free to put the links to them here or if you prefer, send them to me on Discord (sentrivana
) or to my email (ivana.kellyer
@ sentry.io
). Thanks!
Hey @sentrivana, I have sent Sentry event links to your email. There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.
If you need further assistance on my side please let me know, I can override the OTPAdminSite
and retrieve extra information in case of this exception. I can also provide uwsgi
configuration file if that helps.
Thanks a lot for your prompt responses! ๐
Thanks a lot for your prompt responses! ๐
Sure thing, that's why we're here :) Thanks for the quick follow ups!
I have sent Sentry event links to your email.
So I've taken a look at the Sentry events and while the stacktraces provide some extra context, I'm still not sure what's going on, especially why profiling specifically would be causing this.
The facts we have so far:
django-otp
patches theUser
object here, addingis_verified
on it- Django tries to read
is_verified
here with_wrapped
being aUser
instance and fails, indicating the middleware has not run or the user object was somehow overwritten after the fact - The error only happens on the first request
- Turning off profiling gets rid of the issue
- The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the
django-otp
patch - The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
- The profiler itself doesn't do any additional patching so it shouldn't be the case that it's overwriting the
- Setting the profiler aside, our Django integration does an extra call to
User.is_authenticated
here. I'm wondering if this has anything to do with the issue.
Some follow-ups:
- Just to cover the bases,
django-otp
is properly in yourINSTALLED_APPS
and more importantlyMIDDLEWARES
, right? - Could you show me your
sentry_sdk.init
?
There is one other issue that started appearing after enabling profiling, I am not 100% sure that it is directly related with profiling but seems like it, also added link to that to the email just in case.
I had a quick look but don't have many insights -- if you keep seeing this and it indeed looks like a profiler/SDK issue, can you please open a new issue just with that? Makes it easier for us to keep track. :)
@sentrivana Thanks for the detailed response!
Some comments about the facts:
The error only happens on the first request
Now I have realized that this is not the case, sorry for the initial misinformation. To test this I have opened up our QA and Sandbox admin pages and refreshed them tens of times, and eventually I could see the same error again, so it doesn't only happen on the first request but happens occassionally.
Turning off profiling gets rid of the issue
I have checked the deployments just before encountering this issue for the first time, and the issue is introduced after a deployment that includes:
- Upgrading
sentry-sdk
from1.40.6
to2.7.1
- Setting
profiles_sample_rate=0.3
Then the issue disappeared after I have removed profiles_sample_rate
along with some other changes:
- Setting
py-call-uwsgi-fork-hooks=true
inuwsgi
conf (enable-threads
was alreadytrue
). - Upgrading
sentry-sdk
from2.7.1
to2.13.0
Then one week after not seeing the issue, I have enabled profiling again, then immediately we have started seeing this issue again, which is why I strongly suspect that this is related with profiling.
The only thing I can think of why the profiler being on would matter is that it takes a snapshot of the current frame stack periodically -- maybe this leads to something being prematurely evaluated that shouldn't be? (This is kinda far fetched)
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
- 9c4203a666d24a848f1980f236a48671
- 4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception
, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user
object between Sentry
and django-otp
? What I mean is that, could this line be somehow overriding the request.user
that has been patched by django-otp
, maybe after the evaluation of request.user
required by profiling?
Follow-ups
Just to cover the bases, django-otp is properly in your INSTALLED_APPS and more importantly MIDDLEWARES, right?
Yes, copy & pasting from the settings source code:
INSTALLED_APPS = [
...,
"django_otp",
"django_otp.plugins.otp_static",
"django_otp.plugins.otp_totp",
]
MIDDLEWARE = [
m
for m in [
...
("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]
if m
]
I thought that maybe we somehow cannot set or change the ROLE
but you can see that OTPMiddleware
is being called in the trace, so it shouldn't be the case.
Could you show me your sentry_sdk.init
Copy & pasting the whole sentry config from the settings source code:
import os
import re
SENTRY_DSN = os.environ.get("SENTRY_DSN")
SENTRY_ENV = os.environ.get("SENTRY_ENV")
if SENTRY_DSN:
if not SENTRY_ENV:
raise RuntimeError(
"SENTRY_ENV env var needs to be defined (use role name, such as 'admin' or 'buyer_console')"
)
import sentry_sdk
from sentry_sdk.integrations.django import DjangoIntegration
def traces_sampler(sampling_context):
path = sampling_context.get("wsgi_environ", {}).get("PATH_INFO")
method = sampling_context.get("wsgi_environ", {}).get("REQUEST_METHOD")
if path == "/favicon.ico":
return 0
if path.startswith("/static/"):
return 0
if path == "/health/":
return 0.001
if re.match(r"/v1\/api\/some-path\/\S+\/$", path) and method == "GET":
return 0.001
if path == "/v1/api/token/":
return 0.001
return 1
sentry_sdk.init(
dsn=SENTRY_DSN,
integrations=[DjangoIntegration()],
send_default_pii=True,
traces_sampler=traces_sampler,
profiles_sample_rate=1.0,
environment=SENTRY_ENV,
)
thanks @omerfarukabaci, just to rule another possibility out first, you are setting the Authentication middleware like this too before the OTP one right?
https://stackoverflow.com/a/77109153
# "settings.py"
MIDDLEWARE = [
...
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django_otp.middleware.OTPMiddleware' # Here
]
Heyt @sl0thentr0py, you're welcome! Yes, it is set as following:
MIDDLEWARE = [
...
"django.contrib.auth.middleware.AuthenticationMiddleware",
("django_otp.middleware.OTPMiddleware" if ROLE == "admin" else None),
]
Thanks for helping us investigate, this is very much appreciated! โค
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671 4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set send_default_pii=False
for a bit and enabling the profiler. Setting send_default_pii
to False
should disable the code path that leads to us capturing request.user
. Could you try that out and let us know if the error still happens?
Thanks for helping us investigate, this is very much appreciated! โค
Is this sampling being executed in another thread? If you look at the traces of the issues hapenning in the same page with ids:
9c4203a666d24a848f1980f236a48671 4fd9a500f7a749109e9fbaea88e42232
You can see that the error is happening at different times. Also the span that raises the error seems like: event.django - sentry_sdk.integrations.django._got_request_exception, which is something I don't see in other issues, it is normally a span related with our code that raises the error. Can there be a race condition during the evaluation of the lazy request.user object between Sentry and django-otp? What I mean is that, could this line be somehow overriding the request.user that has been patched by django-otp, maybe after the evaluation of request.user required by profiling?
This is sort of the direction I was thinking. One thing to test whether this has anything to do with the SDK accessing the user is to set
send_default_pii=False
for a bit and enabling the profiler. Settingsend_default_pii
toFalse
should disable the code path that leads to us capturingrequest.user
. Could you try that out and let us know if the error still happens?
You're welcome, it is my pleasure, and also thank you for your concern! โค๏ธ Unfortunately I am on vacation for 3 weeks, I will try that and let you know when I am back. ๐
Hey @sentrivana, @sl0thentr0py
I am finally back. ๐ My colleagues changed send_default_pii
to False
, but unfortunately the issue still persists. Anything else I can do to investigate this further?
Hey @omerfarukabaci, welcome back ๐๐ป
So much for my hypothesis ๐ข So it seems like it doesn't have anything to do with the SDK accessing the user data (at least not the _set_user_info
part) since that code path shouldn't be triggered at all if PII sending is False
.
Based on how inconsistently it happens it indeed looks like a possible race condition. I understand that it's super fickle to repro but I don't have any ideas how to debug this further without having an example app that runs into the issue.
Hey @sentrivana, thank you! ๐๐ผ
Since this issue happens everyday on our QA environment, we are actually able to reproduce it. If you could direct me about what else you need to know I can update the code on the QA to gather more information. I can also fork SDK to manipulate its source code to learn more about the problem. Trying to reproduce it in a separate app is also possible, but it requires more work since we don't have much idea about the actual reason.
So the problem as far as I can tell is either that request.user
is modified/overwritten after the OTP middleware has patched it, or the middleware doesn't even run (not that likely).
What we could look at, during a request when the error happens:
- Can we verify that the
django-otp
middleware has actually run and patched the user? - Can we pinpoint when the user object has changed, or at least narrow down points in time when it looks correct vs. when it's already wrong? (E.g. check what it looks like in the profiler before and after it's taken a sample since this is our most likely culprit at the moment.)
What would also be interesting would be comparing this to a working request.
Hey @sentrivana,
What do you think about the following middleware that monkey-patches request
object's getter and setter, and logs the traceback if user
attribute has been changed or accessed:
import logging
import traceback
LOGGER = logging.getLogger(__name__)
class CustomRequestMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def log_traceback_for_user_action(self, action, attr):
if attr == "user":
LOGGER.info(f"[NDV-1622] request.user is being {action} by:")
for frame in traceback.extract_stack():
LOGGER.info(f"\t[NDV-1622] {frame.filename}:{frame.lineno} {frame.name}")
def __call__(self, request):
def custom_setattr(obj, attr, value):
self.log_traceback_for_user_action("set", attr)
return object.__setattr__(obj, attr, value)
def custom_getattr(obj, attr):
self.log_traceback_for_user_action("get", attr)
return object.__getattribute__(obj, attr)
request.__class__.__setattr__ = custom_setattr
request.__class__.__getattribute__ = custom_getattr
return self.get_response(request)
I am planning to add this as the first middleware, and deploy to QA and try to reproduce the issue. Then I will look for any sign of sentry-sdk
. I know it is a bit tricky but I think that could work, also I am kinda out of options.
@omerfarukabaci Nice! Yeah I think that could give us a lot of insight, let's try it.