user_id always None for DRF requests when using RequestMiddleware
immortaleeb opened this issue · 17 comments
In my Django application I'm using the django_structlog.middlewares.RequestMiddleware
to log all of the requests to my application.
I noticed that user_id
is always None
for all the calls made to my Django Rest Framework (DRF) urls.
Now, I know this is because RequestMiddleware
tries to access request.user
, which is only filled in when you're going through the Django authentication mechanism, which DRF bypasses completely. Instead, it performs authentication in the view layer, so after all middleware code has run.
I guess other people have encountered this too so I was wondering if there is any workaround so that I can track which user performed which request from "the start" until "the end" of the request?
I tried several things, but all of them feel like I'm fighting with DRF to get it to do something it doesn't support.
Can this be because of the order of the middlewares in settings.py
?
In the demo of django-structlog it is last, after the authentication middleware.
I'm afraid it's not that simple, the middleware is actually already one of the last ones (only followed by the celery middleware):
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'corsheaders.middleware.CorsMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'django_structlog.middlewares.RequestMiddleware',
'django_structlog.middlewares.CeleryMiddleware',
]
Maybe something else that I didn't make clear is that I'm using DRF's rest_framework.authentication.TokenAuthentication
to authenticate with my REST apis.
As mentioned in my original post, this means that authentication is performed after all of the middlewares.
I found people encountering the same DRF issue but in a different context here: jpadilla/django-rest-framework-jwt#45 (comment)
As far as I can find request.user
is only going to point to an actual user after it has been processed by DRF. So it's only available after processing the response (not at request time). Also see this PR: encode/django-rest-framework#2155
DRF token authentication is not tested in django-structlog and now that you mention this issue I just remembered DRF handle the authentication in its view not in middleware.
It does work with DRF and django session authentication but it is not how drf is used normally. 🤦
I will investigate this and try to figure out a workaround.
But a quick workaround would be to implement a mixin or an intermediary view wrapping DRF authentication then bind/log the user. It is not ideal but it would work. 🤷
Ok, thanks for clarifying.
The problem with an intermediary view is that the code would be executed after request_started
has been logged. Although user_id
would be filled in at request_finished
, it would still be None at request_started
, but I guess I can live with that.
Alternatively, I could implement a middleware that executes after django.contrib.auth.middleware.AuthenticationMiddleware
which calls DRF's authentication explicitly if request.user
is not set, similar to what some people did in jpadilla/django-rest-framework-jwt#45 (comment) . The downside of that would be going to the database twice for authentication (although you could probably fix that with caching).
There are no optimal solution until DRF fix the issue you mentioned. You have to choose your poison until then.
I will update the documentation to reflect this issue.
I would appreciate any pull requests or just code snippets to help working around this.
I wouldn't really know how to integrate this properly with the existing code base, but I can share the solution I've implemented.
I went for the mixin approach, wrapping each of DRF's APIView and subclasses. I added this to my app's views.py file:
import structlog
from structlog.threadlocal import bind_threadlocal
from rest_framework.views import APIView as DRFAPIView
from rest_framework.viewsets import ModelViewSet as DRFModelViewSet
class StructlogMixin:
def initial(self, request, *args, **kwargs):
super().initial(request, *args, **kwargs)
if hasattr(request, 'user'):
user_id = request.user.pk
# Force user_id to be visible in response log
structlog.get_logger('django_structlog.middlewares.request').bind(user_id=user_id)
bind_threadlocal(user_id=user_id)
class APIView(StructlogMixin, DRFAPIView):
pass
class ModelViewSet(StructlogMixin, DRFModelViewSet):
pass
Then, instead of using DRF's APIView
and ModelViewSet
I instead use the overridden ones. You can easily do this by replacing imports:
from rest_framework.views import APIView
becomes
from myapp.views import APIView
Then you also need to make sure that the threadlocal context is cleared after every request because threads might be reused for different HTTP request. I did this by adding a middleware:
from structlog.threadlocal import clear_threadlocal
class StructlogThreadLocalMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
# Clear threadlocal after every response
clear_threadlocal()
return response
I add this right before the first middleware that logs information (in my case this is django-structlog's request middleware):
MIDDLEWARE = [
# ...
'myapp.middlewares.StructlogThreadLocalMiddleware',
'django_structlog.middlewares.RequestMiddleware',
]
Finally, I also added merge_threadlocal
as the first processor in my structlog config because we're using structlog's threadlocal functionality:
structlog.configure(
processors=[
structlog.threadlocal.merge_threadlocal,
# ...
]
# ...
Note that this still has the downside of logging user_id=None
for each request_started
, but all the logs after that should include the correct user_id and you can still track the user_id by searching for the request_finished
event with the same request_id
.
In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?
Hmm, I don't know. That would mean that user_id is always None except for the request_finished
log line.
With my current (although hacky) solution, user_id is only None for request_started
. All the log lines in between actually have a user_id.
On the other hand you could argue that this isn't django-structlog's responsibility because DRF doesn't support hooking into the view layer to support cases like these (afaik). In that case, binding a user_id a second time is probably the only thing you can do.
@jrobichaud
In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?
How do we log any variable (eg: user_id) in the request_finished
log line only and not to the log lines between request_started
and request_finished
?
@jrobichaud
In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?
How do we log any variable (eg: user_id) in the
request_finished
log line only and not to the log lines betweenrequest_started
andrequest_finished
?
It is not supported at the moment, you can create an issue for this.
I made the workaround we discussed, a setting to add the user_id in the request finished.
In 1.5.3 I added a setting to add user_id
in request_finished
and request_failed
.
In 1.5.4 I just made sure user_id
is in request_finished
and request_failed
all the time instead.
Have you considered evaluating DRF authentication classes in the middleware and then setting the user based on that?
That way all log events will contain the user_id
. I have implemented this using signal receivers to make sure user_id
and some other attributes are present in all events.
Here is a simplified version:
@receiver([bind_extra_request_metadata, bind_extra_request_finished_metadata, bind_extra_request_failed_metadata])
def bind_user(request, logger, **kwargs):
"""
Bind Django/REST Framework user to the logger.
"""
# Retrieve DRF user from auth headers if it is not already cached.
token_user: Optional[TokenUser] = getattr(request, "_drf_token_user", None)
if not hasattr(request, "_drf_token_user"):
try:
auth = JWTTokenUserAuthentication().authenticate(request)
if auth:
token_user = auth[0]
except (TokenError, AuthenticationFailed):
pass
# Cache token user to prevent unnecessary JWT decode
request._drf_token_user = token_user
# Bind DRF token user if available.
if token_user is not None:
logger.bind(user_id=token_user.pk)
# Fallback to django auth
elif request.user:
logger.bind(user_id=request.user.pk)
To make it more generic we can read from REST_FRAMEWORK['DEFAULT_AUTHENTICATION_CLASSES']
.
A downside of this is that it is somewhat opinionated and adds overhead of processing authentication twice.
It is an interesting solution, however views may redefine the authentication class.
We would need to get view associated with the url, then get its authentication classes.
@v1k45 could you try to the class from the view?
I would add this workaround in the documentation or in a helper module.