un1t/django-cleanup

Django-cleanup seems not working anymore on Django 1.8

Closed this issue · 24 comments

Django 1.8.5
Django-cleanup 0.4.0

File "/home/khamaileon/workspace/khamaileon/project/api/feed/models.py", line 73, in fetch
    self.save()
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/base.py", line 734, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/base.py", line 771, in save_base
    update_fields=update_fields, raw=raw, using=using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/dispatch/dispatcher.py", line 201, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 65, in delete_old_post_save
    delete_file(old_file, using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 87, in delete_file
    on_commit(run_on_commit, using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 17, in on_commit
    func()
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 84, in run_on_commit
    file_.delete(save=False)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/fields/files.py", line 124, in delete
    self.storage.delete(self.name)
AttributeError: 'FieldFile' object has no attribute 'storage'
un1t commented

Does 0.3.x work it this case?

Please provide information about your model.

Yes perfectly.

class Feed(models.Model):

    source = models.CharField(max_length=30)
    url = models.URLField()
    file = models.FileField(upload_to='feeds', blank=True)
    last_modified = models.DateTimeField(blank=True, null=True)
    interval = models.IntegerField(
        null=True,
        validators=[MinValueValidator(RQ_SCHEDULER_INTERVAL)]
    )

    def __str__(self):
        if self.file:
            return '{0}:{1}'.format(self.source, str(self.file))
        return super(Feed, self).__str__()

    def fetch(self):
        request = requests.get(self.url, stream=True)
        if request.status_code != requests.codes.ok:
            logger.critical('{0} - Wrong status code: {1}'.format(
                str(self),
                request.status_code
            ))
            return False

        last_modified = email_utils.parsedate_to_datetime(
            request.headers['last-modified'])

        self.refresh_from_db()
        if last_modified == self.last_modified:
            return

        self.last_modified = last_modified

        parsed_url = urlsplit(self.url)
        filename = parsed_url.path.split('/')[-1]

        with tempfile.NamedTemporaryFile() as temp_file:

            query_strings = parse_qs(parsed_url.query)
            if 'gZipCompress' in query_strings \
                    and query_strings['gZipCompress'] == ['yes']:

                compressed_stream = BytesIO(request.content)
                gzipper = gzip.GzipFile(fileobj=compressed_stream)
                temp_file.write(gzipper.read())

            else:
                temp_file.write(request.content)

            temp_file.flush()
            self.file.save(filename, File(temp_file))

        self.save()
        return True

    def parse(self, parser):
        parser(self).parse()

Haven't tested anything but here is my theory:

self.file.save(filename, File(temp_file)) saves the model, unless you pass in save=False (https://github.com/django/django/blob/1.8.5/django/db/models/fields/files.py#L112). So when you get to self.save() you are trying to save the instance again, but the caches on the model have not been updated by post_init (since it's the same instance), so the model believes that you are trying to save a new file and tries to delete the old one (i.e. the one you want). And because you are using File and not FieldFile (https://github.com/django/django/blob/1.8.5/django/db/models/fields/files.py#L19) you are losing the storage class and the instance is not reseting it since that happens at model init time with contribute_to_class on the FileField. If that's the case then perhaps we need to update the field caches on post_save as well?

So to test for your case: remove self.save(), or pass in save=False on the self.file.save and use a FieldFile instead of File (e.g. something like FieldFile(self, Feed.file, temp_file) (not tested))

Passing save=False and keeping self.save() gives me the same error.
Removing self.save() and keeping self=True works but the last_modified value is not updated.
self.file is already a FieldFile instance and I think I'm using it correctly (https://docs.djangoproject.com/en/1.8/ref/models/fields/#django.db.models.fields.files.FieldFile.save)

Note that the content argument should be an instance of django.core.files.File, not Python’s built-in file object"

Can you try:

self.file.instance = self
self.file.save(filename, File(temp_file), save=False)

Keep crashing with self.save(). Without self.save(), last_modified value is still not updated.

From my testing that should have worked. I don't think you are running into a bug in django-cleanup, I think you are running into a bug in refresh_from_db. I don't know why we are getting different results...

Ok... I made a few tests this morning. I forgot to tell you that the call to the fetch method was inside a RQ scheduled job.

def fetch_and_parse(feed):
    if feed.fetch():
        feed.parse(parser=MyFeedParser)

job = scheduler.enqueue_in(
    timedelta(seconds=10),
    fetch_and_parse,
    feed
)
job.timeout = 3600
job.save()

From the Django shell, It works perfectly. I just fixed the save method being called twice by adding
self.file.instance = self and removing self.save()

The complete traceback:

AttributeError: 'FieldFile' object has no attribute 'storage'
Traceback (most recent call last):
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/rq/worker.py", line 568, in perform_job
    rv = job.perform()
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/rq/job.py", line 495, in perform
    self._result = self.func(*self.args, **self.kwargs)
  File "/home/khamaileon/workspace/khamaileon/project/api/project_api/startup.py", line 14, in fetch_and_parse
    if feed.fetch():
  File "/home/khamaileon/workspace/khamaileon/project/api/feed/models.py", line 75, in fetch
    self.file.save(filename, File(temp_file))
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/fields/files.py", line 112, in save
    self.instance.save()
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/base.py", line 734, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/base.py", line 771, in save_base
    update_fields=update_fields, raw=raw, using=using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/dispatch/dispatcher.py", line 201, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 65, in delete_old_post_save
    delete_file(old_file, using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 87, in delete_file
    on_commit(run_on_commit, using)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 17, in on_commit
    func()
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django_cleanup/models.py", line 84, in run_on_commit
    file_.delete(save=False)
  File "/home/khamaileon/.virtualenvs/project-api/lib/python3.4/site-packages/django/db/models/fields/files.py", line 124, in delete
    self.storage.delete(self.name)
AttributeError: 'FieldFile' object has no attribute 'storage'

RQ version:
rq==0.5.6
rq-scheduler==0.5.1

Likely this really annoying bug: https://code.djangoproject.com/ticket/21238 that I always have to deal with since I also use redis regularly (which uses pickle).

I have some ideas on how to fix this in this library as a workaround.

Ok I switch back to 0.3.1 in the meanwhile. Let me know if you want me to do some testing.

I was able to fix the loss of storage classes on the field after a pickle/unpickle. The refresh_from_db bug is more difficult (https://code.djangoproject.com/ticket/25547). I was thinking of:

from django_cleanup.utils import refresh_cleanup_cache

instance.refresh_from_db()
refresh_cleanup_cache(instance)

Plus related warnings in the Docs. Would that work?

I'm trying to test your PR but I can't find the right command to install it with pip. How do I do it ?

un1t commented

I'm trying to test your PR but I can't find the right command to install it with pip. How do I do it ?

pip install git+https://github.com/vinnyrose/django-cleanup.git@drop_django16

I've already try it. It installs me the 0.4.0 version.

Collecting git+https://github.com/vinnyrose/django-cleanup.git@drop_django16
  Cloning https://github.com/vinnyrose/django-cleanup.git (to drop_django16) to /tmp/pip-odftm949-build
Installing collected packages: django-cleanup
  Running setup.py install for django-cleanup
Successfully installed django-cleanup-0.4.0

I've got pip 7.1.2 on python 3.4.

In my virtualenv I have nothing in the src directory. I've checked django_cleanup in site-packages, it is the latest release, not the PR.

My bad I forgot -e

pip install -e git+https://github.com/vinnyrose/django-cleanup.git@drop_django16#egg=django_cleanup

It works very well. refresh_from_db() seems to work too.
But I had to keep self.file.instance = self

cleanup.refresh(self) should have made the self.file.instance = self unnecessary.

e.g.

self.refresh_from_db()
cleanup.refresh(self)
assert id(self) == id(self.file.instance)

Oh I miss it. I was waiting for something in "django_cleanup.utils" :)
Works like a charm! Thank you!

un1t commented

I merged PR, so it should work now.

Great! Thank you!

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.