OpenPipe/ART

enforce_eager=True not being respected in test scripts

Opened this issue · 2 comments

Problem

When running test scripts with enforce_eager=True specified, the logs still show enforce_eager=False and CUDA graphs are being calculated. This makes startup slower and leads to a slower feedback cycle during testing.

Reproduction

In the test script src/art/test/test_step_skipping.py, we're passing enforce_eager=True:

# Register the model
await model.register(
    backend,
    _openai_client_config={"engine_args": {"enforce_eager": True}},
)

However, when running the script, the logs show that enforce_eager is still False and CUDA graphs are being compiled.

Expected Behavior

When enforce_eager=True is passed in the configuration, it should:

  1. Skip CUDA graph compilation
  2. Start up faster
  3. Provide quicker feedback during testing

Impact

This issue affects development velocity as tests take longer to start and provide feedback than necessary.

Environment

  • The issue can be reproduced by running: ./src/art/test/test_step_skipping.py
  • The script is configured to run on GPU with sky launch

@corbt _openai_client_config.engine_args does not initialize the engine, so this is unsurprising. Instead engine args have to be specified with TrainableModel._internal_config. The reason why we also have engine_args here is because the OpenAI-compatible API server looks at some of these arguments. This API is really sub-optimal and probably best solved by unifying all args under register, or potentially a new API, something like deploy.

Thanks, that's helpful. Yes definitely in favor of simplifying the API here!