unioslo/mreg

Deadlocks when the same user operates concurrently

Closed this issue · 3 comments

When multiple clients try to authenticate and work against mreg at the same time with the same user, mreg may reach an Internal Server Error due to a database deadlock when one client updates mreg_expiringtoken.last_used at the same time another login of the same user gets issued a new token.

"deadlock detected","Process 2180545 waits for ShareLock on transaction 1481202488; blocked by process 2180546.
Process 2180546 waits for ShareLock on transaction 1481202487; blocked by process 2180545.
Process 2180545: DELETE FROM ""authtoken_token"" WHERE ""authtoken_token"".""key"" IN ('XXXX')
Process 2180546: UPDATE ""mreg_expiringtoken"" SET ""last_used"" = '2021-09-24T09:04:01.419587+00:00'::timestamptz WHERE ""mreg_expiringtoken"".""token_ptr_id"" = 'XXXX'"

This is a regression from #442.

About time for django-rest-knox?

Probably also possible to wrap

def authenticate_credentials(self, key):

with django.db.transaction.atomic():

and change

mreg/mreg/api/views.py

Lines 30 to 34 in 87a9066

ExpiringToken.objects.filter(user=user).delete()
token, created = ExpiringToken.objects.get_or_create(user=user)
return Response({'token': token.key})

to first lookup the ExpiringToken to see if it is expired before deleting it. It would also be need to be wrapped with in a transaction.

Maybe refactor the test from

if token.last_used < timezone.now() - timedelta(hours=EXPIRE_HOURS):
raise exceptions.AuthenticationFailed('Token has expired')

to the model?

Thanks for the feedback. I'm not able to reproduce the problem with this patch:

diff --git a/mreg/api/v1/tests/tests.py b/mreg/api/v1/tests/tests.py
index 9790561..1859855 100644
--- a/mreg/api/v1/tests/tests.py
+++ b/mreg/api/v1/tests/tests.py
@@ -204,13 +204,28 @@ class APITokenAuthenticationTestCase(MregAPITestCase):
             mock_future.return_value = minute_after_expiry
             self.assert_get_and_401("/hosts/")
 
-    def test_token_rotation(self):
+    def test_token_reissue(self):
+        """ Authenticating before expiry should issue the same token. """
         old = ExpiringToken.objects.get(user=self.user)
         self.assert_post_and_200("/api/token-auth/",
                                  {"username": self.user,
                                   "password":"test"})
         new = ExpiringToken.objects.get(user=self.user)
-        assert old.key != new.key
+        assert old.key == new.key
+        self.assertLess(old.last_used, new.last_used)
+
+    def test_token_rotation(self):
+        old = ExpiringToken.objects.get(user=self.user)
+        EXPIRE_HOURS = getattr(settings, 'REST_FRAMEWORK_TOKEN_EXPIRE_HOURS', 8)
+        minute_after_expiry = old.last_used + timedelta(hours=EXPIRE_HOURS,
+                                                        minutes=1)
+        with mock.patch('django.utils.timezone.now') as mock_future:
+            mock_future.return_value = minute_after_expiry
+            self.assert_post_and_200("/api/token-auth/",
+                                     {"username": self.user,
+                                      "password":"test"})
+            new = ExpiringToken.objects.get(user=self.user)
+            assert old.key != new.key
 
     def test_token_usage(self):
         old = ExpiringToken.objects.get(user=self.user)
diff --git a/mreg/api/views.py b/mreg/api/views.py
index adacb03..85736ed 100644
--- a/mreg/api/views.py
+++ b/mreg/api/views.py
@@ -1,4 +1,7 @@
 from django.utils import timezone
+from django.conf import settings
+
+from datetime import timedelta
 
 from rest_framework import status
 from rest_framework.authtoken.views import ObtainAuthToken
@@ -25,12 +28,15 @@ class ObtainExpiringAuthToken(ObtainAuthToken):
                 raise err
 
         user = serializer.validated_data['user']
-
-        # Force token rotation.
-        ExpiringToken.objects.filter(user=user).delete()
-
         token, created = ExpiringToken.objects.get_or_create(user=user)
 
+        EXPIRE_HOURS = getattr(settings, 'REST_FRAMEWORK_TOKEN_EXPIRE_HOURS', 8)
+        time_since_last_use = (timezone.now() - token.last_used)
+        if time_since_last_use > timedelta(hours=EXPIRE_HOURS):
+            # Force rotation of expired tokens.
+            ExpiringToken.objects.filter(user=user).delete()
+            token, created = ExpiringToken.objects.get_or_create(user=user)
+
         return Response({'token': token.key})
 
 
diff --git a/mreg/authentication.py b/mreg/authentication.py
index fcc86b4..0c99938 100644
--- a/mreg/authentication.py
+++ b/mreg/authentication.py
@@ -27,9 +27,6 @@ class ExpiringTokenAuthentication(TokenAuthentication):
         if not token.user.is_active:
             raise exceptions.AuthenticationFailed('User inactive or deleted')
 
-        if token.last_used < timezone.now() - timedelta(hours=EXPIRE_HOURS):
-            raise exceptions.AuthenticationFailed('Token has expired')
-
         # Save the token to update the last_used field.
         token.save()
 

But the mocked test fails because time_since_last_use = (timezone.now() - token.last_used) is always 0:00:00 during the test. Will investigate further after the weekend; a workaround for those affected is to "just" run the jobs at different times...

Any shortcomings with this approach that catches peoples eyes meanwhile?

Edit: the reproducer I'm using is to run:

for i in $(seq 1 100); do python3 mreg-tools/get-zonefiles/get-zonefiles.py --config mreg-tools/get-zonefiles/get-zonefiles.conf; done

...from five concurrent clients. I would like to turn that too into a proper test eventually.

diff --git a/mreg/authentication.py b/mreg/authentication.py
index fcc86b4..0c99938 100644
--- a/mreg/authentication.py
+++ b/mreg/authentication.py
@@ -27,9 +27,6 @@ class ExpiringTokenAuthentication(TokenAuthentication):
         if not token.user.is_active:
             raise exceptions.AuthenticationFailed('User inactive or deleted')
 
-        if token.last_used < timezone.now() - timedelta(hours=EXPIRE_HOURS):
-            raise exceptions.AuthenticationFailed('Token has expired')
-

This can not be removed, or the tokens will be valid forever. The rest of the patch is OK, though hackish.