leondz/garak

try to enable `parallel_attempts > 1` by default

leondz opened this issue · 2 comments

It's sad (and slow) when parallelisation is off but a generator can support it.

Proposal:

  • benchmarks show parallel_attempts has more impact than parallel_requests (over a whole default run) - try to set this to be used by default
  • not all generators support it; we should have a flag in each generators that sets whether it should be run in parallel by default
    • let's repurpose parallel_capable
    • it's True by default
    • needs to be set to False in ggml
    • needs to be sensibly set to False or True for each Hugging Face class
    • it should be possible to override parallel_capable somehow. options:
      1. ignore parallel_(requests|attempts) if generator's parallel_capable == False
      2. use parallel_capable to determine default parallel configuration defaults, which can be overridden in CLI/config options (print/log a warning when this happens)
      3. setting parallel_(requests|attempts) to 1 should (effectively? actually?) override parallel_capable; print/log a notice when this happens
    • what's a sensible default for the rest class?
    • what's a sensible default for parallel_attempts? what's polite? I like 200 with NIMs, the SREs have comments about why this isn't good; 2 is too low. powers of two are popular - how about 8?

This will expose more people to #308

@jmartin-tech opinions welcome!

I like the idea of expanding parallel_capable to limiting both parallel_requests & parallel_attempts for all plugins. I think there could be an argument for wanting to allow concurrent requests while not allowing concurrent attempts, the simplicity of a single control though may outweigh the value.

it should be possible to override parallel_capable somehow. options

In theory, parallel_capable can already be overridden via config on a per generator class basis. The impact would be to all instance of the generator class, and could have impacts for generators within other plugins like probes and detectors.

what's a sensible default for the rest class?
what's a sensible default ...

Using a single default for all types seems viable, I don't see a strong reason to limit rest differently.
8 sounds reasonable, powers of 2 are nice from a cores/threads available perspective.
We could also consider an auto-detect value based on a CPU core count check and scale to 2x the CPU cores possibly with a max of 16 or something to keep it from over saturating on a high core count CPU.