plugin does not handle queued, deleted files correctly
thepurpleblob opened this issue · 4 comments
If a file is deleted before it is sent to turnitin it will be in plagiarism_turnitin_files but no matching entry in the 'files' table. The file will have a 'queued' status. These instances build up.
Once these start to exceed the value of PLAGIARISM_TURNITIN_CRON_SUBMISSIONS_LIMIT it starts to have a detrimental effect on Turnitin processing. A 'missing file' error is generated but no error is flagged in the table, it stays at 'queued'. When there are a lot of these missing queued files it can effectively block turnitin processing.
These should be handled better - I think they should probably result in an actual error condition. There's no point re-queuing a file that cannot be found.
For reference, this is the SQL query we used to find the files...
select *, from_unixtime(mdl_plagiarism_turnitin_files.lastmodified) from mdl_plagiarism_turnitin_files
left join mdl_files on identifier = pathnamehash
where statuscode = 'queued'
and mdl_files.id is null
order by mdl_plagiarism_turnitin_files.lastmodified desc
The problem code is around line 2570 in lib.php
$fs = get_file_storage();
$file = $fs->get_file_by_hash($pathnamehash);
if (!$file) {
plagiarism_turnitin_activitylog('File not found: '.$pathnamehash, 'PP_NO_FILE');
$result = true;
continue;
} else {
try {
$file->get_content();
} catch (Exception $e) {
plagiarism_turnitin_activitylog('File content not found: '.$pathnamehash, 'PP_NO_FILE');
mtrace($e);
mtrace('File content not found. pathnamehash: '.$pathnamehash);
$result = true;
continue;
}
}
The error condition is not updated in the plagiarism_turnitin_files
We have same issue.
The schedule task picks up same records and new submissions are not processed because of this issue.
In our case, I can replicate the issue from following process.
- A student submit assignment with file.
- The student remove the submission.
- The schedule task says 'no file to submit' because there is a record in mdl_plagiarism_turnitin_files but not in mdl_files.
If the student doesn't submit file again, the record is in the queue forever, because the status remains 'queued'.
I think we need to update the codes for this issue:
- Properly handle no file case and mark the record in mdl_plagiarism_turnitin_files as error. We need this as no file case potentially can happen.
- When a submission is removed the plugin should remove corresponding record from mdl_plagiarism_turnitin_files table. Same way as it inserts when a submission is created.
- Add sorting so older records go first.
The patch #582 fixes for 1.
I'll add 2 and 3 later.
We've actually got a script we run every so often to update the database directly. It's not ideal.
Thank you for reporting this. Because the latest version of the plagiarism plugin is supported for versions of Moodle 4.1 and higher, I am closing this ticket. However, if you find this issue is occurring with the latest version of the plugin in any of the supported Moodle versions, please create a new ticket and we will address it.