Dependency Monkey tests are taking too long to complete
Closed this issue ยท 8 comments
Describe the bug
Integration tests for the dependency_monkey
feature are failing because Adviser analysis took more than 15min to complete:
@seizes_inspection_namespace
Scenario Outline: Run Dependency Monkey -- @1.1 Dependency Monkey # features/dependency_monkey.feature:13
Given deployment is accessible using HTTPS # features/steps/common.py:26 0.839s
When I schedule Dependency Monkey 1 times for simple_tensorflow example with dry run set to True with predictor AUTO and configuration {} # features/steps/dependency_monkey.py:33 0.773s
Then wait for Dependency Monkey to finish successfully # features/steps/dependency_monkey.py:83 1283.094s
Traceback (most recent call last):
File "/home/mcostant/.local/lib/python3.8/site-packages/behave/model.py", line 1329, in run
match.run(runner.context)
File "/home/mcostant/.local/lib/python3.8/site-packages/behave/matchers.py", line 98, in run
self.func(context, *args, **kwargs)
File "features/steps/dependency_monkey.py", line 93, in wait_for_dependency_monkey_to_finish
raise RuntimeError("Adviser analysis took too much time to finish")
RuntimeError: Adviser analysis took too much time to finish
To Reproduce
Run dependency_monkey
tests on stage and see the error.
Expected behavior
Adviser analysis is completed on time.
/priority critical-urgent
@fridex it looks like the tests are still failing with the increased timeout, I think we need to try with a higher timeout or investigate to find out is there is another cause to this failure.
/kind bug
/assign
The first problem here was that we scaled down Argo's workflow controller because we run data aggregation using selinon. I've scaled up Argo deployment (and scaled down selinon deployment) which caused that workflows are now scheduled. However there is another issue:
ImagePullBackOff: Back-off pulling image "image-registry.openshift-image-registry.svc:5000/thoth-backend-stage/adviser:latest"
It looks like inspections finish, but there is another deployment issue:
Unable to attach or mount volumes: unmounted volumes=[kafka-secrets], unattached volumes=[argo-artifact-repository-secrets input-artifacts kafka-secrets kube-api-access-vdvjm podmetadata]: timed out waiting for the condition
MountVolume.SetUp failed for volume "kafka-secrets" : references non-existent secret key: kafka_user.crt
looks like you are working on it @mayaCostantini ?
/lifecycle active
@goern I think @fridex and @harshad16 are working on this as this is a deployment issue
PR #287 also fixes this issue:
thoth.adviser.exceptions.PipelineConfigurationError: Filed to initialize pipeline unit configuration for 'PlatformBoot' with configuration {'default_platform': 'linux-x86_64'}: extra keys not allowed @ data['default_platform']
dependency-monkey-220308141455-d2c59fcd85f210-3260034838-main.log