try to enable `parallel_attempts > 1` by default
leondz opened this issue · 2 comments
It's sad (and slow) when parallelisation is off but a generator can support it.
Proposal:
- benchmarks show
parallel_attempts
has more impact thanparallel_requests
(over a whole default run) - try to set this to be used by default - not all generators support it; we should have a flag in each generators that sets whether it should be run in parallel by default
- let's repurpose
parallel_capable
- it's
True
by default - needs to be set to False in ggml
- needs to be sensibly set to False or True for each Hugging Face class
- it should be possible to override
parallel_capable
somehow. options:- ignore
parallel_(requests|attempts)
if generator'sparallel_capable == False
- use
parallel_capable
to determine default parallel configuration defaults, which can be overridden in CLI/config options (print/log a warning when this happens) - setting
parallel_(requests|attempts)
to1
should (effectively? actually?) override parallel_capable; print/log a notice when this happens
- ignore
- what's a sensible default for the rest class?
- what's a sensible default for
parallel_attempts
? what's polite? I like 200 with NIMs, the SREs have comments about why this isn't good; 2 is too low. powers of two are popular - how about8
?
- let's repurpose
This will expose more people to #308
@jmartin-tech opinions welcome!
I like the idea of expanding parallel_capable
to limiting both parallel_requests
& parallel_attempts
for all plugins. I think there could be an argument for wanting to allow concurrent requests
while not allowing concurrent attempts
, the simplicity of a single control though may outweigh the value.
it should be possible to override
parallel_capable
somehow. options
In theory, parallel_capable
can already be overridden via config
on a per generator
class basis. The impact would be to all instance of the generator
class, and could
have impacts for generators within other plugins like probes
and detectors
.
what's a sensible default for the rest class?
what's a sensible default ...
Using a single default for all types seems viable, I don't see a strong reason to limit rest
differently.
8 sounds reasonable, powers of 2 are nice from a cores/threads available perspective.
We could also consider an auto-detect
value based on a CPU core count check and scale to 2x the CPU cores possibly with a max of 16 or something to keep it from over saturating on a high core count CPU.