LibCrowds/libcrowds

Adding new tasks for a volume seems to succeed but the public interface throws confetti

mialondon opened this issue · 7 comments

Cribbed in part from #843 where we were originally dealing with the issue:

Summary: volumes added to the site via the project admin interface appear to save successfully but volunteers can't access tasks on them.

When adding new volumes via the admin interface, all seems to go well, and you'll get an email saying '[number] new tasks were imported successfully to your project [task type e.g. Transcribe Dates]: [volume name e.g. Miscellaneous theatres: Stroud - Tullamore 1788-1848 (Vol. 2)]!'.

Christian additionally noted that trends seem to be:

  • Pre-existing volumes not generating new tasks, even when some vols had had successful tasks completed in the past
  • Several new vols loaded successfully and tasks generated
  • Several new vols loaded but generating the confetti issue

Steps to reproduce: [we might need to update this - @christianalgar does it match with your more recent experience?]

  1. find a sample manifest to add (e.g. https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022588967.0x000002/manifest.json ) by searching for 'A collection of playbills' in the Catalogue box in the top right-hand corner of https://bl.uk
  2. add a new volume via https://www.libcrowds.com/admin/collection/playbills/volumes/new following the steps in #850

Expected results: a new volume should be available on the projects page https://www.libcrowds.com/collection/playbills/projects. When a volunteer follows the link, they should have access to the task.

Actual result: When a volunteer follows the link, they get the 'confetti' message, 'Hooray! You have completed all available tasks for this project. As we have more than one person to complete each task to ensure high quality results, we still need more contributions before the project is marked as complete, so please spread the word!'

Related:

  • original early diagnosis of the issue #843 (comment) 'What I can see from my analysis of the database is that some projects didn't have any tasks and it seems to correlate with a missing manifest in the parent volumes. I'm not sure how this actually happened, whether it is an omission during creation or the admin interface being slightly buggy and loosing some inputs when things are not done in a proper sequence.'

  • Update 'how to add a new volume' documentation #850

Some thoughts on uncovering the error: @christianalgar can you note the time and dates when you've tried to add volumes recently that led to the error? We might be able to match them to error messages with details in the traceback calls sent via email.

For example, the inbox has errors from Tuesday around 1:30, 2:30 and 5:30pm.

This doc has a summary of dates when particular volumes / tasks were attempted to be added:
ITS review.xlsx

Some emails received from attempts to add tasks / projects copied below (with times). No email received for the majority of tasks added that failed - this might be because I was deleting the volumes almost immediately to prevent any users experiencing the confetti task.

We would need to add tasks that show the confetti bug and leave them there to receive a notice, I expect?

LibCrowds Support support@libcrowds.com
To:

Mon, Jun 15 at 8:25 AM

Hello,

126 new tasks were imported successfully to your project Mark Titles: A collection of playbills from Theatre, Scarborough 1784-1846.!

All the best,
The LibCrowds team.

Mon, Jun 15 at 2:26 PM

Hello,

42 new tasks were imported successfully to your project Transcribe Titles: Windsor Castle 1849-1861!

All the best,
The LibCrowds team.

Mon, Jun 15 at 2:45 PM

Hello,

554 new tasks were imported successfully to your project Transcribe Dates: Miscellaneous Birmingham theatres 1774-1800!

All the best,
The LibCrowds team.

Mon, Jun 15 at 3:03 PM

Hello,

It looks like there were no new records to import to your project Transcribe Genres: Theatre Royal, Bristol 1819-1823 (Vol. 2)!

All the best,
The LibCrowds team.

Mon, Jun 15 at 4:52 PM

Hello,

368 new tasks were imported successfully to your project Mark Titles: A collection of playbills from Theatre Royal, Liverpool 1820-1822 (Vol. 1)!

All the best,
The LibCrowds team.

Mon, Jun 15 at 5:03 PM

Hello,

281 new tasks were imported successfully to your project Mark Titles: Covent Garden Theatre 1753-1779!

All the best,
The LibCrowds team.

Mon, Jun 15 at 5:32 PM

Hello,

300 new tasks were imported successfully to your project Transcribe Dates: A collection of playbills from Theatre, Drayton 1795-1844 (Vol. 1)!

All the best,
The LibCrowds team.

Tue, Jun 16 at 2:33 PM

Hello,

316 new tasks were imported successfully to your project Transcribe Dates: A collection of playbills from miscellaneous theatres: Huddersfield - Ledbury 1783-1864 (Vol. 2)!

All the best,
The LibCrowds team.

I've looked up the traceback errors for the first three attempts. They look pretty useful so I can do the rest if it'd help @harryjmoss

Mon, Jun 15 at 8:25 AM
126 new tasks were imported successfully to your project Mark Titles: A collection of playbills from Theatre, Scarborough 1784-1846.!

af17cb36-be7d-4e16-8446-18724062782d has failed more than 3 times [arrived 08:34]
Please, review the background jobs of your server.
This is the trace error


Traceback (most recent call last):
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/worker.py", line 479, in perform_job
rv = job.perform()
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/job.py", line 466, in perform
self._result = self.func(*self.args, **self.kwargs)
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/jobs.py", line 65, in import_tasks_with_redundancy
import_tasks(project_id, **import_data)
File "/var/www/pybossa/pybossa/jobs.py", line 519, in import_tasks
report = importer.create_tasks(task_repo, project_id, **form_data)
File "/var/www/pybossa/pybossa/importers/importer.py", line 68, in create_tasks
for task_data in importer.tasks():
File "/var/www/pybossa/pybossa/importers/iiif.py", line 38, in tasks
return self._generate_tasks()
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/importers/iiif_enhanced.py", line 28, in _generate_tasks
child_task_data = self._get_child_task_data(task_data, self.parent_id)
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/importers/iiif_enhanced.py", line 48, in _get_child_task_data
raise BulkImportException(err_msg)
BulkImportException: A parent annotation has an invalid target

Mon, Jun 15 at 2:26 PM
42 new tasks were imported successfully to your project Transcribe Titles: Windsor Castle 1849-1861!

33653ad0-8729-40f7-8946-68e7bcda79e5 has failed more than 3 times [arrived 14:33]

Please, review the background jobs of your server.
This is the trace error


Traceback (most recent call last):
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/worker.py", line 479, in perform_job
rv = job.perform()
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/job.py", line 466, in perform
self._result = self.func(*self.args, **self.kwargs)
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/jobs.py", line 65, in import_tasks_with_redundancy
import_tasks(project_id, **import_data)
File "/var/www/pybossa/pybossa/jobs.py", line 519, in import_tasks
report = importer.create_tasks(task_repo, project_id, **form_data)
File "/var/www/pybossa/pybossa/importers/importer.py", line 68, in create_tasks
for task_data in importer.tasks():
File "/var/www/pybossa/pybossa/importers/iiif.py", line 38, in tasks
return self._generate_tasks()
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/importers/iiif_enhanced.py", line 28, in _generate_tasks
child_task_data = self._get_child_task_data(task_data, self.parent_id)
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/importers/iiif_enhanced.py", line 48, in _get_child_task_data
raise BulkImportException(err_msg)
BulkImportException: A parent annotation has an invalid target

Mon, Jun 15 at 2:45 PM
554 new tasks were imported successfully to your project Transcribe Dates: Miscellaneous Birmingham theatres 1774-1800!

de42bf2f-e47e-4f4a-955f-83d746149490 has failed more than 3 times [arrived 14:34]

Traceback (most recent call last):
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/worker.py", line 479, in perform_job
rv = job.perform()
File "/var/www/pybossa/env/local/lib/python2.7/site-packages/rq/job.py", line 466, in perform
self._result = self.func(*self.args, **self.kwargs)
File "/var/www/pybossa/pybossa/plugins/pybossa_lc/jobs.py", line 65, in import_tasks_with_redundancy
import_tasks(project_id, **import_data)
File "/var/www/pybossa/pybossa/jobs.py", line 519, in import_tasks
report = importer.create_tasks(task_repo, project_id, **form_data)
File "/var/www/pybossa/pybossa/importers/importer.py", line 73, in create_tasks
task_repo.save(task)
File "/var/www/pybossa/pybossa/repositories/task_repository.py", line 107, in save
raise DBIntegrityError(e)
DBIntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "task" violates foreign key constraint "task_project_id_fkey"
DETAIL: Key (project_id)=(242) is not present in table "project".

[SQL: INSERT INTO task (created, project_id, state, quorum, calibration, priority_0, info, n_answers, fav_user_ids) VALUES (%(created)s, %(project_id)s, %(state)s, %(quorum)s, %(calibration)s, %(priority_0)s, %(info)s, %(n_answers)s, %(fav_user_ids)s) RETURNING task.id]
[parameters: {'info': '{"tileSource": "https://api.bl.uk/image/iiif/ark:/81055/vdc_100022589089.0x000107/info.json", "url_m": "https://api.bl.uk/image/iiif/ark:/81055/vdc_1 ... (533 characters truncated) ... dc_100022589089.0x000107/full/1024,/0/default.jpg", "manifest": "https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589090.0x000002/manifest.json"}', 'fav_user_ids': None, 'n_answers': 30, 'quorum': 0, 'calibration': 0, 'created': '2020-06-15T13:33:31.681528', 'state': u'ongoing', 'project_id': 242, 'priority_0': 0}]
(Background on this error at: http://sqlalche.me/e/gkpj)

More notes from Christian from email about his spreadsheet:

The attached spreadsheet shows volumes on ITS with status for each project (complete; failed [plus date]; added [plus date]).

I have colour-coded vols in green to indicate a successful action / new project and red to indicate an unsuccessful action.

I have tried using the “Analyse empty results” function on Pybossa which seems to have had no effect BUT, part completed volumes have reappeared: (maybe because of the analyse empty results being triggered?) – two projects reappearing are:

  • Tran titles Bristol 1819-23 vol 2 – 99%
  • Tran dates haymarket 1781-83 99%

I tried to update (reload) a manifest for a volume from which no projects had been done: Birmingham theatres 1801-1805 (Vol. 1) but it still failed.

I have added two new volumes, but the project’s still bombed:

  • Bath 1819-1823 (Vol. 1) – dates
  • Bath 1819-1823 (Vol. 2) – mark titles

Curiously I have been offered a task for a volume in which the task with the volume has already been done:

  • Bristol 1819-1823 (Vol. 2) – Genres

Weirdly, I got one project to load successfully after I tinkered with the Task Scheduler – I changed it from ‘Default’ to ‘Depth First All’ – this (might) of re-jigged a project that at first threw up confetti but then worked after altering the Task Scheduler. Unfortunately, I could not replicate this success with another bombed task. I got momentarily excited that a fix or workaround had been chanced upon. Vol that loaded this way was, I think, either:

  • Windsor Castle transcribe titles
  • Birmingham 1774-1800 – dates
  1. Have just added a fresh volume: A collection of playbills from Theatre Royal, Hull 1827-1830.
    Used manifest below
    https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589160.0x000002/manifest.json?manifest
    Prepared a Mark Titles project - confetti bug occurs.

  2. Email says it was successful:
    Tue, Jun 23 at 5:15 PM

Hello,

368 new tasks were imported successfully to your project Mark Titles: A collection of playbills from Theatre Royal, Hull 1827-1830!

All the best,
The LibCrowds team.

  1. Checked project task and works.

  2. Added TRANSCRIBE DATES: A COLLECTION OF PLAYBILLS FROM THEATRE ROYAL, MANCHESTER 1793-1808 (VOL. 1)

  3. No email to say successful - confetti bug occurs.

  4. Added: Mark Titles: A collection of playbills from Theatre Royal, Manchester 1793-1808 (Vol. 2)

  5. Email says was successful.

@christianalgar the Manchester Transcribe Dates task is also confetti-ing. I noticed that the project listing page says it has 0 tasks - I guess that's useful in terms of checking them quickly, and might also be diagnostic for @harryjmoss ?

Screenshot_2020-06-23 In the Spotlight

The errors are also available in the 'Background tasks' screen on the site's backend menu.

@harryjmoss a sudden thought - could the sql errors be related to recent(ish) database changes made during the other work?