Upkeep error
sondrelg opened this issue · 13 comments
Hi @tobymao!
Out of nowhere, our saq
process started emitting Upkeep task failed unexpectedly
errors, where the exception logger contains the following traceback:
Traceback (most recent call last):
File \"/opt/venv/lib/python3.11/site-packages/saq/worker.py\", line 180, in poll
await func(arg or sleep)
File \"/opt/venv/lib/python3.11/site-packages/saq/worker.py\", line 167, in schedule
scheduled = await self.queue.schedule(lock)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/saq/queue.py\", line 248, in schedule
return await self._schedule_script(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/commands/core.py\", line 4983, in __call__
return await client.evalsha(self.sha, len(keys), *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/asyncio/client.py\", line 487, in execute_command
return await conn.retry.call_with_retry(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/asyncio/retry.py\", line 59, in call_with_retry
return await do()
^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/asyncio/client.py\", line 463, in _send_command_parse_response
return await self.parse_response(conn, command_name, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/asyncio/client.py\", line 505, in parse_response
response = await connection.read_response()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/opt/venv/lib/python3.11/site-packages/redis/asyncio/connection.py\", line 963, in read_response
raise response from None
redis.exceptions.ResponseError: user_script:12: too many results to unpack script: 15c892840ce49156b99eaf77c367913a8c069ed5, on @user_script:12.
Not sure if this is really a saq issue, but any ideas as to what could be wrong? I've flushed scripts in the redis instance, but the problem persists.
how many jobs are you trying to schedule? perhaps you could batch them up?
this could be a limitation in lua? https://stackoverflow.com/questions/35582647/whats-the-maximum-values-accepted-by-redis-lpush-list-operation
it seems like you can do at least 100k, can you confirm how many values you're trying to schedule?
It's possible we've exceeded 100k. Seems to be non-saq related. Thanks @tobymao 👍
I'm facing this issue when I accumulate 22k tasks scheduled, is there any workaround for this or it's just not viable for scheduling that many tasks?
it seems like maybe we need to update the schedule script to batch this up. feel free to create a PR for this, it shouldn't be too difficult but i don't have the bandwidth for it.
playing with ChatGPT I got this:
if redis.call('EXISTS', KEYS[1]) == 0 then
redis.call('SETEX', KEYS[1], ARGV[1], 1)
local jobs = redis.call('ZRANGEBYSCORE', KEYS[2], 1, ARGV[2])
local limit = 8000 -- Adjust limit based on your environment's capabilities
local start_index = 1
while start_index <= #jobs do
local end_index = math.min(start_index + limit - 1, #jobs)
local scores = {}
for i = start_index, end_index do
table.insert(scores, 0)
table.insert(scores, jobs[i])
end
redis.call('ZADD', KEYS[2], unpack(scores))
redis.call('RPUSH', KEYS[3], unpack(jobs, start_index, end_index))
start_index = end_index + 1
end
return jobs
end
But it failed maybe because the 8k limit is too high still, i'll keep testing, meanwhile this one worked but maybe adds too much overhead calling zadd and rpush for each item.
if redis.call('EXISTS', KEYS[1]) == 0 then
redis.call('SETEX', KEYS[1], ARGV[1], 1)
local jobs = redis.call('ZRANGEBYSCORE', KEYS[2], 1, ARGV[2])
if next(jobs) then
for _, v in ipairs(jobs) do
redis.call('ZADD', KEYS[2], 0, v)
redis.call('RPUSH', KEYS[3], v)
end
end
return jobs
end
why don't you make the limit 1000?
please don't submit any chat gpt code that you don't fully understand. you can use it as a starting point, but you should deeply understand the reasoning behind every single line of code that you submit in the PR.
that's why I didn't submit the PR :), I started using the library a week ago, so untill today had no idea what this code did, or how the scheduler works etc
why don't you make the limit 1000?
still, I thought that if I understood correctly to get the less calls possible the idea would be to batch them as big as the limit is no?
yea, i guess find a reasonably large number.