toluaina/pgsync

`through_tables` does not work on grandchildren

jakent opened this issue · 3 comments

jakent commented

PGSync version: 2.5.0

Postgres version: 15.3

Elasticsearch version: 8.6.2

Redis version: 7.2

Python version: 3.8.9

Problem Description: though_tables does not work for grandchildren with a relationship field.

While it works as expected for direct child:

[
  {
    "database": "super_cool",
    "index": "users",
    "nodes": {
      "table": "users",
      "columns": [
        "id",
        "name"
      ],
      "children": [
        {
          "table": "group",
          "columns": [
            "id",
            "name"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "through_tables": [
              "users_groups"
            ]
          }
        }
      ]
    }
  }
]

It doesn’t work, when I do the same for a grandchild:

  {
    "database": "super_cool",
    "index": "job-managers",
    "nodes": {
      "table": "job",
      "children": [
        {
          "table": "users",
          "columns": [
            "id",
            "name"
          ],
          "relationship": { <---- I need this relationship because we also have created_by and last_modified_by which also join to this table
            "variant": "object",
            "type": "one_to_one",
            "foreign_key": {
              "child": ["id"],
              "parent": ["manager_id"]
            }
          },
          "children": [
            {
              "table": "group",
              "columns": [
                "id",
                "name"
              ],
              "relationship": {
                "variant": "object",
                "type": "one_to_many",
                "through_tables": [
                  "users_groups"
                ]
              }
            }
          ]
        }
      ]
    }
  }
]

Error Message (if any):

pgsync           | 2023-07-10 20:30:47.926:ERROR:pgsync.search_client: Exception 'super_cool.users_groups'
pgsync           | Traceback (most recent call last):
pgsync           |   File "/usr/local/lib/python3.8/site-packages/pgsync/search_client.py", line 133, in bulk
pgsync           |     self._bulk(
pgsync           |   File "/usr/local/lib/python3.8/site-packages/pgsync/search_client.py", line 188, in _bulk
pgsync           |     for _ in self.parallel_bulk(
pgsync           |   File "/usr/local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 472, in parallel_bulk
pgsync           |     for result in pool.imap(
pgsync           |   File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 868, in next
pgsync           |     raise value
pgsync           |   File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker
pgsync           |     result = (True, func(*args, **kwds))
pgsync           |   File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
pgsync           |     raise ex
pgsync           |   File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
pgsync           |     for i, x in enumerate(iterable):
pgsync           |   File "/usr/local/lib/python3.8/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
pgsync           |     for action, data in actions:
pgsync           |   File "/usr/local/lib/python3.8/site-packages/pgsync/sync.py", line 939, in sync
pgsync           |     self.query_builder.build_queries(
pgsync           |   File "/usr/local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 879, in build_queries
pgsync           |     self._through(node)
pgsync           |   File "/usr/local/lib/python3.8/site-packages/pgsync/querybuilder.py", line 654, in _through
pgsync           |     for column in foreign_keys[through.name]:
pgsync           | KeyError: 'super_cool.users_groups'
pgsync           |  - super_cool.job
pgsync           |     - super_cool.users
pgsync           |        - super_cool.group
pgsync           |  0:01:20.856575 (80.86 sec)
jakent commented

to give an example with the test data, the following schema will fail with the same error:

{
            "table": "user",
            "columns": ["id", "name"],
            "children": [
                {
                    "table": "book",
                    "columns": ["isbn", "title", "description"],
                    "relationship": {
                        "variant": "object",
                        "type": "one_to_many",
                        "foreign_key": {
                            "parent": ["id"],
                            "child": ["buyer_id"],
                        },
                    },
                    "children": [
                        {
                            "table": "author",
                            "columns": ["id", "name"],
                            "label": "authors",
                            "relationship": {
                                "type": "one_to_many",
                                "variant": "object",
                                "through_tables": ["book_author"],
                            },
                        }
                    ]
                }
            ]
        }

taking a look at this PR thanks

This works for me in the newest version 3.1.0