solver feature tests are failing because of solver documents not present on stage
Closed this issue ยท 13 comments
Describe the bug
When running tests for the solver
feature on stage, the scheduling of solver analyses jobs fails with the following error:
Scenario: Schedule solver jobs for all available solvers in Thoth # features/solvers.feature:7
Given deployment is accessible using HTTPS # features/steps/common.py:26 0.898s
When we ask for the available solvers # features/steps/solvers.py:30 0.431s
Then schedule solver analyses for package selinon with version 1.0.0 # features/steps/solvers.py:47 0.564s
Then wait for analyses to finish successfully # features/steps/solvers.py:65 45.758s
Assertion Failed: Solver solver-rhel-8-py38-220221135642-63d012bf2e7f41b9 run on management.stage.thoth-station.ninja was not successful
The solver document solver-rhel-8-py38-220221135642-63d012bf2e7f41b9
is present in ocp4-stage
.
This is also the case with other packages with available solver documents (ex: tensorflow
2.7.0
) .
To Reproduce
Run integration tests for the solver
feature on stage and see the error.
Expected behavior
The analyses finish successfully.
/priority critical-urgent
/assign @KPostOffice
/lifecycle active
I started looking into this. I ran the solver for selinon
by sending a request to the management api
. I expected it to fail. However, it succeeded. I ran the request again to determine whether the failure might be slightly flakey.
I ended up getting the following error:
DETAIL: Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.
My current naive solution is that this issue can be fixed by adding force_sync=true
to the request.
Below is the full trace for the exception
DETAIL: Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.
[SQL: INSERT INTO python_package_version (package_name, package_version, os_name, os_version, python_version, entity_id, python_package_index_id, python_package_metadata_id, is_missing, provides_source_distro) VALUES (%(package_name)s, %(package_version)s, %(os_name)s, %(os_version)s, %(python_version)s, %(entity_id)s, %(python_package_index_id)s, %(python_package_metadata_id)s, %(is_missing)s, %(provides_source_distro)s) RETURNING python_package_version.id]
[parameters: {'package_name': 'selinon', 'package_version': '1.0.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 1, 'python_package_index_id': 1, 'python_package_metadata_id': 151600, 'is_missing': False, 'provides_source_distro': True}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
2022-02-23 23:36:15,856 1 CRITICAL root:105: Traceback (most recent call last):
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL: Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 52, in get_or_create
session.commit()
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1046, in commit
self.transaction.commit()
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
self.session.flush()
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2540, in flush
self._flush(objects)
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2682, in _flush
transaction.rollback(_capture_exception=True)
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2642, in _flush
flush_context.execute()
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
persistence.save_obj(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 239, in save_obj
_emit_insert_statements(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1135, in _emit_insert_statements
result = cached_connections[connection].execute(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL: Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.
[SQL: INSERT INTO python_package_version (package_name, package_version, os_name, os_version, python_version, entity_id, python_package_index_id, python_package_metadata_id, is_missing, provides_source_distro) VALUES (%(package_name)s, %(package_version)s, %(os_name)s, %(os_version)s, %(python_version)s, %(entity_id)s, %(python_package_index_id)s, %(python_package_metadata_id)s, %(is_missing)s, %(provides_source_distro)s) RETURNING python_package_version.id]
[parameters: {'package_name': 'selinon', 'package_version': '1.0.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 1, 'python_package_index_id': 1, 'python_package_metadata_id': 151600, 'is_missing': False, 'provides_source_distro': True}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "app.py", line 252, in <module>
cli()
File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "app.py", line 226, in cli
_do_sync(
File "app.py", line 125, in _do_sync
stats = sync_documents(
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 574, in sync_documents
stats_change = handler(
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 131, in sync_solver_documents
graph.sync_solver_result(document, force=force)
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 5982, in sync_solver_result
python_package_version = self._create_python_package_version(
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 4711, in _create_python_package_version
python_package_version, _ = PythonPackageVersion.get_or_create(
File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 62, in get_or_create
return session.query(cls).filter_by(**kwargs).one(), True
File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 3500, in one
raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()
This might most probably indicate that some solver data have changed and thus solver result fails to sync on integrity error - thoth-station/storages#2308
@fridex From what you said here and looking at the source code, it looks like this behavior is intentional. It should be fine to just submit with the force param set, correct? This will delete the entry so that we can create a new entry with the updated metadata
Sadly, we do not have that force
sync implemented. The sync will always fail. When force
is supplied, we should perform updates if entries exist in the database rather than trying to get_or_create
entries (which causes the mentioned integrity error).
@fridex shall we turn your last comment into a feature request? It seems to trace back to thoth-station/storages#2308 ?!
Yes, Kevin's PR could solve this.
Actually, Kevin's PR will fix another issue that we have - syncing results multiple times and updating package metadata.
There can be two reasons behind the failure reported in this issue:
- middletier stage is used for data ingestion which means the solver scenario can timeout when we run solvers (solver triggered is queued and waits till the rest of workflows finish)
- when a solver finishes, it is deleted from the namespace pretty quickly - I observed there were 404 issues as integration-tests when integration-tests were not able to obtain status for deleted solvers
A proposed fix:
- keep this solver specific scenario enabled only for the prod environment - we want to have an ability to (re)solve urgent packages as per user requests
- disable this solver scenario in the stage environment
The fix proposed by @fridex seems to have worked, can we close this issue?
Yes, I think we can close it ๐๐ป Thanks!
/close
@fridex: Closing this issue.
In response to this:
Yes, I think we can close it ๐๐ป Thanks!
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.