Redis ingest is growing faster than pulse:work can keep up with
sts-ryan-holton opened this issue · 5 comments
Pulse Version
1.3.2
Laravel Version
11.36.1
PHP Version
8.3.13
Livewire Version
latest
Database Driver & Version
MySql 8.0.33
Description
Hi, I'm using the Redis ingest stream for my project. I'm observing that the ingest stream in Redis appears to be growing slightly quicker than what the worker can keep up with.
What can be done to prevent the stream growing excessively and killing a Redis instance by running out of memory whilst still retaining data accuracy?
Steps To Reproduce
What can be done?
Is this a very high-traffic application?
There are a few options:
- Determine if there's a particularly "hot" metric and enable sampling on it, or add an ignore pattern if there's a particularly hot key that isn't worth recording. For most applications, the most active recorder would probably be
UserRequests. The more records you have, the more aggressively you can sample without sacrificing too much accuracy. - Determine if the server running the
pulse:workcommand or the Redis server are consistently under high load and consider increasing their resources. You may want to run thepulse:workcommand on a dedicated server. - Tweak the
pulse.ingest.redis.chunkconfiguration. By default, this is set to 1,000. Thepulse:workcommand will continually ingest this many records from the stream until it gets fewer than this amount from the stream, and then it waits for 1 second before starting again. If you increase the value, then thework:commandwill use more memory but will be able to ingest more records in between each read from the stream, which should increase throughput provided you have enough memory to handle it.
Can there be multiple 'pulse:work' instances running? I wonder how this would affect database locking on rows
It's not designed to have multiple pulse:work commands running simultaneously. This would likely insert duplicates and, as you mentioned, potentially lock rows. Redis stream consumer groups would need to be used to solve the duplication issue.
Is that something Pulse might add in the future? In my application I've got a tonne of requests, events and jobs. My use case I cannot enable sampling because this doesn't accurately represent the true numbers, without another pulse worker my pulse entries are delayed, which from a business point of view makes it hard to make decisions quickly.
I just tried various chunk sizes, and observed the number of entries in the stream after 1 min with the worker running, deleted the key each min to run these tests at different chunk sizes, the number of entries after each minute whilst the worker was running throughout:
chunk: 100
duration: 1 min
entries: 8256
chunk: 250
duration: 1 min
entries: 8223
chunk: 500
duration: 1 min
entries: 8399
chunk: 1000
duration: 1 min
entries: 9303
chunk: 2000
duration: 1 min
entries: 11039
Seems like smaller chunk sizes might have a better impact
I can't imagine why smaller chunks would be faster, as this effectively adds more work and network overhead. It's a very unexpected result!
There's no plans to add consumer groups for multiple pulse:work commands. For Pulse's primary use case, sampled data should adequately surface performance issues.