thoth-station/storages

Unable to re-solve already solved package with an error

fridex opened this issue · 11 comments

Describe the bug

One of the reasons thoth-station/adviser#1850 is failing is missing dependency information for google-resumable-media==1.2.0 which causes resolver to look for another resolution path (unsuccessfully). The reason behind this is a failed solver run marked in the database. If I try to solve the mentioned package locally or in the cluster using solver-rhel-8-py38, the solver succeeds. Looks like we have wrong data/result synced in the database.

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL:  Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(google-resumable-media, 1.2.0, 1, rhel, 8, 3.8) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 52, in get_or_create
    session.commit()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1046, in commit
    self.transaction.commit()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 504, in commit
    self._prepare_impl()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
    self.session.flush()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2540, in flush
    self._flush(objects)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2682, in _flush
    transaction.rollback(_capture_exception=True)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    {"name": "sentry_sdk.errors", "levelname": "DEBUG", "module": "transport", "lineno": 222, "funcname": "_send_event", "created": 1621256410.762395, "asctime": "2021-05-17 13:00:10,762", "msecs": 762.394905090332, "relative_created": 13092.246532440186, "process": 1, "message": "Sending event, type:null level:error event_id:4d447d25be454d0aa4bec32a01fd2c81 project:1298083 host:sentry.io"}
raise exception
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2642, in _flush
    flush_context.execute()
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
    rec.execute(self)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
    persistence.save_obj(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 239, in save_obj
    _emit_insert_statements(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1135, in _emit_insert_statements
    result = cached_connections[connection].execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "python_package_version_package_name_package_version_python__key"
DETAIL:  Key (package_name, package_version, python_package_index_id, os_name, os_version, python_version)=(google-resumable-media, 1.2.0, 1, rhel, 8, 3.8) already exists.

[SQL: INSERT INTO python_package_version (package_name, package_version, os_name, os_version, python_version, entity_id, python_package_index_id, python_package_metadata_id, is_missing, provides_source_distro) VALUES (%(package_name)s, %(package_version)s, %(os_name)s, %(os_version)s, %(python_version)s, %(entity_id)s, %(python_package_index_id)s, %(python_package_metadata_id)s, %(is_missing)s, %(provides_source_distro)s) RETURNING python_package_version.id]
[parameters: {'package_name': 'google-resumable-media', 'package_version': '1.2.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 2438371, 'python_package_index_id': 1, 'python_package_metadata_id': 313979, 'is_missing': False, 'provides_source_distro': True}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 252, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "app.py", line 226, in cli
    _do_sync(
  File "app.py", line 125, in _do_sync
    stats = sync_documents(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 574, in sync_documents
    stats_change = handler(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/sync.py", line 131, in sync_solver_documents
    graph.sync_solver_result(document)
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 5078, in sync_solver_result
    python_package_version = self._create_python_package_version(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/postgres.py", line 3869, in _create_python_package_version
    python_package_version, _ = PythonPackageVersion.get_or_create(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/storages/graph/models_base.py", line 62, in get_or_create
    return session.query(cls).filter_by(**kwargs).one(), True
  File "/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 3500, in one
    raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()

To Reproduce
Steps to reproduce the behavior:

  1. Go to prod and schedule solver solver-rhel-8-py38 for google-resumable-media==1.2.0
  2. See the solver finishes successfully
  3. Graph sync fails with the exception reported above

Expected behavior

Graph sync should sync solver result.

Interestingly, the solver document synced previously has no package information:

{
  "metadata": {
    "analyzer": "thoth-solver",
    "analyzer_version": "1.6.3",
    "arguments": {
      "python": {
        "exclude_packages": null,
        "index": "https://pypi.org/simple",
        "no_pretty": false,
        "no_transitive": true,
        "output": "/mnt/workdir/solver-rhel-8-py38-4f4eb1b6",
        "requirements": "google-resumable-media===1.2.0",
        "virtualenv": "/home/solver/venv"
      },
      "thoth-solver": {
        "verbose": false
      }
    },
    "datetime": "2020-12-15T21:12:22.560044",
    "distribution": {
      "codename": "Ootpa",
      "id": "rhel",
      "like": "fedora",
      "version": "8.3",
      "version_parts": {
        "build_number": "",
        "major": "8",
        "minor": "3"
      }
    },
    "document_id": "solver-rhel-8-py38-4f4eb1b6",
    "duration": 10,
    "hostname": "solver-rhel-8-py38-4f4eb1b6-3956294335",
    "os_release": {
      "id": "rhel",
      "name": "Red Hat Enterprise Linux",
      "platform_id": "platform:el8",
      "redhat_bugzilla_product": "Red Hat Enterprise Linux 8",
      "redhat_bugzilla_product_version": "8.3",
      "redhat_support_product": "Red Hat Enterprise Linux",
      "redhat_support_product_version": "8.3",
      "version": "8.3 (Ootpa)",
      "version_id": "8.3"
    },
    "python": {
      "api_version": 1013,
      "implementation_name": "cpython",
      "major": 3,
      "micro": 3,
      "minor": 8,
      "releaselevel": "final",
      "serial": 0
    },
    "thoth_deployment_name": "ocp4-stage",
    "timestamp": 1608066742
  },
  "result": {
    "environment": {
      "implementation_name": "cpython",
      "implementation_version": "3.8.3",
      "os_name": "posix",
      "platform_machine": "x86_64",
      "platform_python_implementation": "CPython",
      "platform_release": "4.18.0-193.14.3.el8_2.x86_64",
      "platform_system": "Linux",
      "platform_version": "#1 SMP Mon Jul 20 15:02:29 UTC 2020",
      "python_full_version": "3.8.3",
      "python_version": "3.8",
      "sys_platform": "linux"
    },
    "environment_packages": [
      {
        "package_name": "pipdeptree",
        "package_version": "1.0.0"
      }
    ],
    "errors": [],
    "platform": "linux-x86_64",
    "tree": [],
    "unparsed": [],
    "unresolved": [
      {
        "index_url": "https://pypi.org/simple",
        "is_provided_package": true,
        "is_provided_package_version": false,
        "package_name": "google-resumable-media",
        "version_spec": "===1.2.0"
      }
    ]
  }
}

Should we modify solver workflow to be:

add new condition resync to the workflow scheduling which is False by default, for task number 2 of the workflow.

  • 1 solver workflow-task
  • 2 allow-resync workflow-task check solver version run, check if the package name,version index + solver is solved already and in that case delete data. (resync=True condition, otherwise this task does not run)
  • 3 graph-sync workflow-task

who is going to schedule solver workflows with resync=True? A new cronworkflow component that checks for new solver version available and schedule solver for packages that are solved with solver version below a certain solver version, so thoth keeps updating itself for the knowledge? and we don't put all data in the databse.

graph-refresh will keep scheduling unsolved packages, using always latest solver available, so there is no conflict with the check introduced.

wdyt @fridex @harshad16 @goern ?

Should we modify solver workflow to be:

add new condition resync to the workflow scheduling which is False by default, for task number 2 of the workflow.

  • 1 solver workflow-task
  • 2 allow-resync workflow-task check solver version run, check if the package name,version index + solver is solved already and in that case delete data. (resync=True condition, otherwise this task does not run)
  • 3 graph-sync workflow-task

I think we can reuse THOTH_FORCE_SYNC parameter used in graph-sync task. This way, the user/component responsible for scheduling the graph-sync task will be aware there is done force sync of results computed.

who is going to schedule solver workflows with resync=True? A new cronworkflow component that checks for new solver version available and schedule solver for packages that are solved with solver version below a certain solver version, so thoth keeps updating itself for the knowledge? and we don't put all data in the databse.

It might be good if we start scheduling workflows on our own (no code/component yet). If we spot a bug in solver data that we know we have the logic on which data should be recomputed (trigger solver workflows).

graph-refresh will keep scheduling unsolved packages, using always latest solver available, so there is no conflict with the check introduced.

This will be expensive when it comes to resources. If we decide which solver results are affected (which have wrong data) we can trigger an update just for the affected components, not to recompute all the data we have once again (which takes weeks).

/reopen

@fridex: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/remove lifecycle-rotten
/lifecycle frozen
/sig stack-guidance

This issue is already resolved in the past release of thoth-station/storages module. The following are details:

Cause of the issue was the introduction python_package_metadata_id field in the PythonPackageVersion table via the function _create_python_package_version when there was unique key constraint on the set

"package_name", "package_version", "python_package_index_id", "os_name", "os_version", "python_version"

So on each transaction of new entry to the table PythonPackageVersion , it would through sqlalchemy.exc.IntegrityError error, as the entry based on
"package_name", "package_version", "python_package_index_id", "os_name", "os_version", "python_version"
satisfies, on entry different is python_package_metadata_id.

For example entry: [parameters: {'package_name': 'google-resumable-media', 'package_version': '1.2.0', 'os_name': 'rhel', 'os_version': '8', 'python_version': '3.8', 'entity_id': 2438371, 'python_package_index_id': 1, 'python_package_metadata_id': 313979, 'is_missing': False, 'provides_source_distro': True}]

and SQL tries to get the existing entry it wouldn't be able to capture it , as

return session.query(cls).filter_by(**kwargs).one(), True
try to find entry with all the field (include python_package_metadata_id) which doesn't exists.
This issue got fixed with commit 47a2b66 part of PR

With help of PR's #2310 and #2602.
This particular issue is fixed.
Closing the issue.