[Bug] Evaluation can't stop running on self-host agenta
shhhhhh2001 opened this issue · 4 comments
Describe the bug
Any kind of evaluation can't stop running on self-host agenta.
To Reproduce
Steps to reproduce the behavior:
- In CLI, run "docker compose -f docker-compose.gh.yml up -d --pull always" in production mode.
- Create any variant and any evaluator.
- Start a new evaluation with any evaluator(e.g. Build-in Exact Match)
- See error
Expected behavior
Complete running very quickly if there are 3 data entries (e.g. 3s on the cloud).
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-46-generic x86_64)
- Browser: Firefox on win10(Ubuntu is the server's OS where we deploy Agenta. I access it through ssh -L and use UI on Win10)
- Version: 126.0.1
Important Context
It worked well before you revised the code evaluator yesterday. All the variants and datasets on the screenshots are the same between the cloud and our host.
-
Network Logs in the Browser:
- If applicable, Take a screenshot of the network logs or copy and paste any relevant log entries.
-
Docker Containers Information:
- If applicable, Provide a screenshot showing the list of running and stopped containers.
- Please provide us the with a screenshot of the logs for the agenta-backend docker container.
- Additional Information:
- Include any additional details, error messages, or observations that may be helpful.
sometimes, the error 502 will show up in the UI, but there's no log in the agenta-backend docker container's log.
After deleting the evaluation job on UI, it throws out 204 and 500.
Please make sure to provide all the requested information to expedite the debugging process. Thank you for helping us improve our project!
The issue with evaluations not stopping on self-hosted Agenta could be related to the configuration of the Celery worker. The celery_worker
service is responsible for handling background tasks, including evaluations. If the Celery worker is not properly configured or not running, evaluations may not be processed correctly.
To resolve this issue, ensure that the celery_worker
service is correctly configured and running. Here is the relevant part of the docker-compose.gh.yml
file for the celery_worker
service:
celery_worker:
container_name: celery-worker-1
image: ghcr.io/agenta-ai/agenta-backend
command: >
celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO
environment:
- MONGODB_URI=mongodb://username:password@mongo:27017
- REDIS_URL=redis://redis:6379/0
- CELERY_BROKER_URL=amqp://guest@rabbitmq//
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- FEATURE_FLAG=oss
volumes:
- ./agenta-backend/agenta_backend:/app/agenta_backend
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- mongo
- rabbitmq
- redis
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- agenta-network
Ensure that the celery_worker
service is up and running by checking the status of the Docker containers:
docker-compose -f docker-compose.gh.yml ps
If the celery_worker
service is not running, you can start it with:
docker-compose -f docker-compose.gh.yml up -d celery_worker
Additionally, check the logs for the celery_worker
service to identify any issues:
docker-compose -f docker-compose.gh.yml logs celery_worker
If the celery_worker
service is running correctly and the issue persists, it might be related to the evaluation status management in the code. The runningStatuses
constant in the agenta-web/src/services/evaluations/index.ts
file defines the statuses that indicate an evaluation is still running. Ensure that the evaluation status is being updated correctly in the backend and that the frontend is correctly interpreting these statuses.
Moreover, you should also check the backend code, specifically in the agenta-backend/agenta_backend/routers/evaluation_router.py
file, to ensure that the evaluation status is being fetched and updated correctly. The fetch_evaluation_status
endpoint is responsible for returning the status of an evaluation. Make sure that this endpoint is functioning correctly and that the status is being updated in the database.
Additionally, there might be issues with the CLI command for running evaluations. The run_evaluation_cli
function in the agenta-cli/agenta/cli/evaluation_commands.py
file uses hardcoded values for the app_name
and host
parameters. Ensure that these values are correctly set and that the CLI command is functioning as expected:
@click.group()
def evaluation():
"""Commands for evaluations."""
pass
@evaluation.command(name="run")
def run_evaluation_cli():
"""Run an evaluation."""
try:
client.run_evaluation(
app_name="sss",
host="http://localhost",
)
except Exception as ex:
click.echo(click.style(f"Error while running evaluation: {ex}", fg="red"))
By ensuring that the celery_worker
service is properly configured and running, verifying the evaluation status management in the code, checking the backend endpoints, and ensuring the CLI command is correctly configured, you should be able to resolve the issue with evaluations not stopping on self-hosted Agenta [1][2][3][4].
Hi @shhhhhh2001 we might have an issue in the gh
docker compose. We are looking into it. In the mean time, can you please run agenta using the development mode flow, this should fix the issue:
git clone https://github.com/Agenta-AI/agenta.git
cd agenta
docker compose -f "docker-compose.yml" up -d --build