thoth-station/integration-tests

solver feature tests are failing because of solver documents not present on stage

Closed this issue ยท 13 comments

Describe the bug
When running tests for the solver feature on stage, the scheduling of solver analyses jobs fails with the following error:

Scenario: Schedule solver jobs for all available solvers in Thoth      # features/solvers.feature:7
    Given deployment is accessible using HTTPS                           # features/steps/common.py:26 0.898s
    When we ask for the available solvers                                # features/steps/solvers.py:30 0.431s
    Then schedule solver analyses for package selinon with version 1.0.0 # features/steps/solvers.py:47 0.564s
    Then wait for analyses to finish successfully                        # features/steps/solvers.py:65 45.758s
      Assertion Failed: Solver solver-rhel-8-py38-220221135642-63d012bf2e7f41b9 run on management.stage.thoth-station.ninja was not successful

The solver document solver-rhel-8-py38-220221135642-63d012bf2e7f41b9 is present in ocp4-stage.
This is also the case with other packages with available solver documents (ex: tensorflow 2.7.0) .

To Reproduce
Run integration tests for the solver feature on stage and see the error.

Expected behavior
The analyses finish successfully.

/priority critical-urgent

/assign @KPostOffice
/lifecycle active

I started looking into this. I ran the solver for selinon by sending a request to the management api. I expected it to fail. However, it succeeded. I ran the request again to determine whether the failure might be slightly flakey.

I ended up getting the following error:
DETAIL: Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.

My current naive solution is that this issue can be fixed by adding force_sync=true to the request.

Below is the full trace for the exception

DETAIL:  Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.

[SQL: INSERT INTO python_package_version (package_name, package_version, os_name, os_version, python_version, entity_id, python_package_index_id, python_package_metadata_id, is_missing, provides_source_distro) VALUES (%(package_name)s, %(package_version)s, %(os_name)s, %(os_version)s, %(python_version)s, %(entity_id)s, %(python_package_index_id)s, %(python_package_metadata_id)s, %(is_missing)s, %(provides_source_distro)s) RETURNING python_package_version.id]
[parameters: {'package_name': 'selinon', 'package_version': '1.0.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 1, 'python_package_index_id': 1, 'python_package_metadata_id': 151600, 'is_missing': False, 'provides_source_distro': True}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
2022-02-23 23:36:15,856   1 CRITICAL root:105: Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL:  Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 52, in get_or_create
    session.commit()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1046, in commit
    self.transaction.commit()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 504, in commit
    self._prepare_impl()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
    self.session.flush()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2540, in flush
    self._flush(objects)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2682, in _flush
    transaction.rollback(_capture_exception=True)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2642, in _flush
    flush_context.execute()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
    rec.execute(self)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
    persistence.save_obj(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 239, in save_obj
    _emit_insert_statements(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1135, in _emit_insert_statements
    result = cached_connections[connection].execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL:  Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(selinon, 1.0.0, 1, rhel, 8, 3.8) already exists.

[SQL: INSERT INTO python_package_version (package_name, package_version, os_name, os_version, python_version, entity_id, python_package_index_id, python_package_metadata_id, is_missing, provides_source_distro) VALUES (%(package_name)s, %(package_version)s, %(os_name)s, %(os_version)s, %(python_version)s, %(entity_id)s, %(python_package_index_id)s, %(python_package_metadata_id)s, %(is_missing)s, %(provides_source_distro)s) RETURNING python_package_version.id]
[parameters: {'package_name': 'selinon', 'package_version': '1.0.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 1, 'python_package_index_id': 1, 'python_package_metadata_id': 151600, 'is_missing': False, 'provides_source_distro': True}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 252, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "app.py", line 226, in cli
    _do_sync(
  File "app.py", line 125, in _do_sync
    stats = sync_documents(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 574, in sync_documents
    stats_change = handler(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 131, in sync_solver_documents
    graph.sync_solver_result(document, force=force)
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 5982, in sync_solver_result
    python_package_version = self._create_python_package_version(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 4711, in _create_python_package_version
    python_package_version, _ = PythonPackageVersion.get_or_create(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 62, in get_or_create
    return session.query(cls).filter_by(**kwargs).one(), True
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 3500, in one
    raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()

This might most probably indicate that some solver data have changed and thus solver result fails to sync on integrity error - thoth-station/storages#2308

@fridex From what you said here and looking at the source code, it looks like this behavior is intentional. It should be fine to just submit with the force param set, correct? This will delete the entry so that we can create a new entry with the updated metadata

Sadly, we do not have that force sync implemented. The sync will always fail. When force is supplied, we should perform updates if entries exist in the database rather than trying to get_or_create entries (which causes the mentioned integrity error).

goern commented

@fridex shall we turn your last comment into a feature request? It seems to trace back to thoth-station/storages#2308 ?!

Yes, Kevin's PR could solve this.

Actually, Kevin's PR will fix another issue that we have - syncing results multiple times and updating package metadata.

There can be two reasons behind the failure reported in this issue:

  1. middletier stage is used for data ingestion which means the solver scenario can timeout when we run solvers (solver triggered is queued and waits till the rest of workflows finish)
  2. when a solver finishes, it is deleted from the namespace pretty quickly - I observed there were 404 issues as integration-tests when integration-tests were not able to obtain status for deleted solvers

A proposed fix:

  1. keep this solver specific scenario enabled only for the prod environment - we want to have an ability to (re)solve urgent packages as per user requests
  2. disable this solver scenario in the stage environment

The fix proposed by @fridex seems to have worked, can we close this issue?

Yes, I think we can close it ๐Ÿ‘๐Ÿป Thanks!

/close

@fridex: Closing this issue.

In response to this:

Yes, I think we can close it ๐Ÿ‘๐Ÿป Thanks!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.