Performance issues with scheduled tasks
danmarsden opened this issue · 12 comments
The scheuled tasks appear to be extremely in-efficient.
In particular the "update_reports" task which is set to run every 5 min by default.
for one of our sites - it doesn't usually do less than 5000 db reads and 800 writes every time, which means that's it's doing 720,000 reads and 115,200 writes over a 24hr period.
We had a much bigger situation yesterday (guessing a bunch of assignments were submitted) where it was averaging around 25,000 reads, 7000 writes every 5min - so it hit almost 4 Million db reads and 1 million writes over the same 24 hr period.
Looking at this code here:
moodle-plagiarism_turnitin/lib.php
Line 1921 in 74f979a
I can see several places where db queries are called inside loops when you should really be able to get a lot more information in the initial db query.
thanks!
After looking into performance issues with our scheduled tasks in cron we are also seeing ~7000 DB queries each time plagiarism_turnitin\task\update_reports
is run.
Not as bad as customcert which was almost doing 27k queries every minutes, but still needs to be handled better.
it's fun when it runs "on the hour" too... because that just happens to be when people like to schedule exams/timed quizzes - right when you don't really want the db server doing heaps of other work...
Here's the last 24 hours of DB queries for our Moodle's sorted by query count.
This is the top 7 queries. 2/3/5/7 all look similar on the graph so are most likely caused by this plugin (but some are used in Moodle elsewhere), or exacerbated by this plugin.
Every 5 minutes, 6681 minimum queries logged in the cron output for this task.
We only return 124 rows for the get_records_select
too, so something funny going on with this task.
Most of these queries also have a 40/60 split for lock time to query time. Reducing these will remove lots of unnecessary load on the DB but also remove a lot of locking queries.
Seeing the same here. These are our last 3 runs:
42498 reads
9061 writes
42554 reads
9073 writes
39580 reads
8413 writes
I am changing our task runtime to be every 10 minutes instead of every 5 minutes. That function does run a lot of things in loops... See https://github.com/turnitin/moodle-plagiarism_turnitin/blob/master/lib.php#L1954
I'd suggest running it every 13 minutes - or a setting that doesn't end up with it running on the hour every hour. You really don't want this task kicking in at the exact moment 500 users (or more!) all enter a timed quiz.
Taking a quick look into this.
The SQL here could be changed to this so we fetch more data in the initial query
moodle-plagiarism_turnitin/lib.php
Lines 1961 to 1968 in 1859714
SELECT ptf.*, m.name AS 'modulename', cm.instance AS 'moduleinstance'
FROM mdl_plagiarism_turnitin_files ptf
JOIN mdl_course_modules cm ON cm.id = ptf.cm
JOIN mdl_modules m ON m.id = cm.module
WHERE statuscode = 'success'
AND ( similarityscore IS NULL OR duedate_report_refresh = 1 )
AND ( orcapable = 1 OR orcapable IS NULL )
This means we then don't need to call this function get_coursemodule_from_id
moodle-plagiarism_turnitin/lib.php
Line 1974 in 1859714
Removing that function will then remove two of the top queries, one for getting the module name, and one to get the course module record
Doing a call to get the activity module in each part of the query also seems too much, especially if there are a lot of files/submissions that are part of the same assignment. This could be cached in an array or similar and checked as part of the loop.
moodle-plagiarism_turnitin/lib.php
Line 1977 in 1859714
Also should check if $tiisubmission->duedate_report_refresh <> 1
before reassigning it 1 here
moodle-plagiarism_turnitin/lib.php
Lines 1980 to 1982 in 1859714
Settings per module should also be cached. Seeing 120k odd queries for the plagiarism config lookups too.
moodle-plagiarism_turnitin/lib.php
Line 1985 in 1859714
Is this required here if we already have the config settings per module?
moodle-plagiarism_turnitin/lib.php
Line 2007 in 1859714
Calling get_coursemodule_from_id
again here in a loop which is 2 more of the same queries per submission again
moodle-plagiarism_turnitin/lib.php
Line 2101 in 1859714
I'd suggest running it every 13 minutes - or a setting that doesn't end up with it running on the hour every hour. You really don't want this task kicking in at the exact moment 500 users (or more!) all enter a timed quiz.
Running at */10 or */13 will still start it at 0 minutes.
To skip beginning and end of the hour, something like "7/11 * * * *" would work. That runs at ":07, :18, :29, :40, and :51", every 11 minutes starting at 7 past.
Edit: Moodle won't allow that cron format, so instead just comma separate the minutes you want it to run ("7,18,29,40,51") and use that instead
@agrowe Unfortunately Moodle's task system flags 7/11 for the minute field as "Data submitted is invalid".
@rlorenzo see my edit above, you can comma separate a list of minutes you want it to run. I used 7,18,29,40,51 as it's odd minutes and I don't want it near a the end or middle of an hour in it's current form.
Thanks for the insight everyone, I'll get this looked at.
@jmcgettrick, Is this being targeted to be fixed/looked at in a specific upcoming release?