Question: Disable storing of task arguments
davocarli opened this issue · 16 comments
Hi, apologies if this is documented somewhere (I can't find it), but is there a quick way not to store the arguments that were passed to the celery task? Some of my tasks take in potentially sensitive information, and I would prefer not to have a permanent record of them inside of my database (I'm also looking at encrypting this information, but would prefer not to store it regardless).
It appears that this issue is associated with the CVE-2020-17495 CVE id.
Furthermore this is now being flagged by Pyup Safety (https://github.com/pyupio/safety-db):
╞══════════════════════════════════════════════════════════════════════════════╡
│ REPORT │
│ checked 178 packages, using pyup.io's DB │
╞════════════════════════════╤═══════════╤══════════════════════════╤══════════╡
│ package │ installed │ affected │ ID │
╞════════════════════════════╧═══════════╧══════════════════════════╧══════════╡
│ django-celery-results │ 1.2.1 │ <=1.2.1 │ 38678 │
╞══════════════════════════════════════════════════════════════════════════════╡
│ Django-celery-results through 1.2.1 stores task results in the database. │
│ Among the data it stores are the variables passed into the tasks. The │
│ variables may contain sensitive cleartext information that does not belong │
│ unencrypted in the database. See CVE-2020-17495. │
╘══════════════════════════════════════════════════════════════════════════════╛
can you try the latest versions?
Just to close the loop on this for those coming to this ticket, use the argsrepr
and kwargsrepr
support in Celery:
Django Celery Results uses these values when store results in the DB.
@auvipy I understand where people are coming from in terms of this ultimately being a thing for consumers to decide, but I think it's a reasonable ask for the default behaviour to be "more secure" meaning the "fix" for this CVE would be what's outlined in this comment.
Ultimately what I'm seeking is to be able to stick a patched version on the CVE as otherwise we have to handle this anytime we want to use this package in our applications.
I'm happy to help out with this where I can, though I'm not a Python developer so might require some help myself 😅
current version is not affected by CVE, version belo 1.2 were effected :)
and #154 (comment)
@auvipy the comment we've both linked to says that the setting currently isn't respected - are you saying that's now been changed?
Would you be able help me find the commit(s) that fixed this CVE?
@auvipy based on this comment, this CVE isn't fixed, it's just not accurately reflected in the safetydb.
The reason this is an issue for us is that it is present in the databases osv-detector
pulls from, and it is a genuine security issue to have this enabled by default.
I am OK with any help
Ill see what I can do, but as I said I'm very much not a Python Dev by trade so will likely need help.
It could be good to reopen this issue incase other Python developers are interested in helping.
@G-Rath You can set a repr on what is stored by celery results by setting the values in kwargsrepr
. What you see in the celery results for the password is ****
not the hashed_password.
app.send_task(
"user.tasks.update_password_in_external_service",
kwargs={"user_id": self.id, "password": hashed_password},
kwargsrepr=repr({"user_id": self.id, "password": "****"}),
)
@peterfarrell that's good to know, but not really relevant here since it has to be applied manually to known arguments so doesn't resolve the general security issue.
Having a brief look over code, it seems that celery handles this by using a TaskExtended
class when result_extended
is true, which captures name
, rgs
, kwargs
, worker
, retries
, and queue
(source)
At this point I assume these are the only "sensitive" fields being stored (well, really args
and kwargs
) and that they're not needed by this library in anyway (i.e. they're being stored purely for logging/debugging purposes), as otherwise kwargsrepr
wouldn't be usable.
Looking over the code in the DatabaseBackend
, it seems args
and kwargs
are set here:
django-celery-results/django_celery_results/backends/database.py
Lines 77 to 97 in 6af51d9
So hopefully that should mean fixing this should be a matter of having those assignments only occur if result_extended
is True
- otherwise just set them to None
.
The next thing to figure out is how to know what result_extended
is set to within that method - I'm hoping it'll be exposed somewhere on self
, or maybe in __init__
🤔
If the keys with sensitive data are not known, it wouldn't be hard to mask all values:
kwargs = {
"random_a": "my_password",
"random_b": "000-00-0000", # social security number
}
app.send_task(
"user.tasks.update_password_in_external_service",
kwargs=kwargs,
# Dynamically build masked dictionary
kwargsrepr=repr({ k: "****" for k, v in kwargs.items() }),
)