jonaswinkler/paperless-ng

[BUG] After migration from paperless, "Checksum mismatch" prevents adding new files

danieldietsch opened this issue · 0 comments

Describe the bug
After bare-metal install of paperless-ng, copy media/ and data/ from old paperless installation, ran
python manage.py migrate followed by python manage.py document_index reindex without error.

Then ran python manage.py document_sanity_checker and received many errors of the form

...
[2022-02-26 23:56:46,981] [ERROR] [paperless.sanity_checker] Checksum mismatch of document 141. Stored: 8c84a88c095f6c4491d588666fef36b2, actual: af8807e9262235b101a4c42dc4e6c1e8.
...

Actual md5sum is indeed af8807e9262235b101a4c42dc4e6c1e8 for media/documents/originals/0000141.pdf
Viewing and modifying meta-data works, but when I want to add a new file, the task scheduler reports

[2022-02-27 00:11:24,269] [INFO] [paperless.consumer] Document 2022-01-30 ... consumption finished
00:11:24 [Q] INFO Process-1:1 stopped doing work
00:11:24 [Q] INFO Processed [scan_flachbett.pdf]
[2022-02-27 00:11:24,297] [ERROR] [paperless.consumer] The following error occured while consuming scan_flachbett.pdf: UNIQUE constraint failed: documents_document.checksum
Traceback (most recent call last):
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: documents_document.checksum

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<dir>/paperless-ng/src/documents/consumer.py", line 287, in try_consume_file
    document = self._store(
  File "<dir>/paperless-ng/src/documents/consumer.py", line 382, in _store
    document = Document.objects.create(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: documents_document.checksum
[2022-02-27 00:11:24,303] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-1ljxtjms
00:11:24 [Q] INFO Process-1:2 stopped doing work
00:11:24 [Q] ERROR Failed [scan_flachbett.pdf] - scan_flachbett.pdf: The following error occured while consuming scan_flachbett.pdf: UNIQUE constraint failed: documents_document.checksum : Traceback (most recent call last):
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: documents_document.checksum

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<dir>/venv/lib/python3.9/site-packages/asgiref/sync.py", line 288, in main_wrap
    raise exc_info[1]
  File "<dir>/paperless-ng/src/documents/consumer.py", line 287, in try_consume_file
    document = self._store(
  File "<dir>/paperless-ng/src/documents/consumer.py", line 382, in _store
    document = Document.objects.create(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "<dir>/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: documents_document.checksum

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<dir>/venv/lib/python3.9/site-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "<dir>/paperless-ng/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "<dir>/paperless-ng/src/documents/consumer.py", line 346, in try_consume_file
    self._fail(
  File "<dir>/paperless-ng/src/documents/consumer.py", line 70, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: scan_flachbett.pdf: The following error occured while consuming scan_flachbett.pdf: UNIQUE constraint failed: documents_document.checksum

and does not add the file.

Expected behavior
No checksum error or method to recompute checksums during migration (I guess the old paperless did not have the UNIQUE constraint on checksums?).

I am not sure if this is actually a bug, but as paperless still works without issue, it might be a thing that could be avoided.