collerek/ormar

prefetch_related merges entries when many-to-many fields point to the same table

teonat opened this issue · 1 comments

First of all, thanks for creating Ormar! I've just started testing it, and I really like what I see so far.
I have, however, run into some problems with many-to-many fields with Ormar. The problem described here is the first I've successfully been able to recreate in a fashion that's made me want to start an issue.

Describe the bug
Let's say I have a model called Book with two many-to-many fields pointing to the same model, Author.
The Book-fields pointing to Author is authors and co-authors.

I understand that there are mainly two ways to fetch the Books with their relations:
await Book.objects.select_related(['authors', 'co_authors']).all()
and
await Book.objects.prefetch_related(['authors', 'co_authors']).all()

The difference, as I understand it, is that the former builds one large query to fetch the data, while the latter executes one query per model.
The select_related becomes way too slow for the actual db-setup I've been testing with, so I've had to resort to the prefetch_related.
However, I've noticed that the prefetch_related joins authors and co-authors and adds all of them to both fields.

When select_related gives me:

{'authors': [{'id': 1, 'name': 'Veronica Porter'},
             {'id': 2, 'name': 'Charlene Diaz'}],
 'co_authors': [{'id': 3, 'name': 'John Turner'},
                {'id': 4, 'name': 'Pamela Nguyen'},
                {'id': 5, 'name': 'Stephanie Carr'}],
 'description': None,
 'id': 1,
 'name': 'Book titled something'}

prefetch_related gives me:

{'authors': [{'id': 1, 'name': 'Veronica Porter'},
             {'id': 2, 'name': 'Charlene Diaz'},
             {'id': 3, 'name': 'John Turner'},
             {'id': 4, 'name': 'Pamela Nguyen'},
             {'id': 5, 'name': 'Stephanie Carr'}],
 'co_authors': [{'id': 1, 'name': 'Veronica Porter'},
                {'id': 2, 'name': 'Charlene Diaz'},
                {'id': 3, 'name': 'John Turner'},
                {'id': 4, 'name': 'Pamela Nguyen'},
                {'id': 5, 'name': 'Stephanie Carr'}],
 'description': None,
 'id': 1,
 'name': 'Book titled something'}

select_related above is correct in respect to how I created the book-entry and added the related authors.
prefetch_related bunches them all together in both the authors and co_authors-field

To Reproduce
Models to reproduce:

from typing import Optional

import ormar
import sqlalchemy

from app.core.db import database


class BaseMeta(ormar.ModelMeta):
    database = database
    metadata = sqlalchemy.MetaData()


class Author(ormar.Model):
    class Meta(BaseMeta):
        tablename = "authors"

    id: int = ormar.Integer(primary_key=True)
    name: str = ormar.String(max_length=256)


class BookAuthor(ormar.Model):
    class Meta(BaseMeta):
        tablename = 'book_authors'

    id: int = ormar.Integer(primary_key=True)


class BookCoAuthor(ormar.Model):
    class Meta(BaseMeta):
        tablename = 'book_co_authors'

    id: int = ormar.Integer(primary_key=True)


class Book(ormar.Model):
    class Meta(BaseMeta):
        tablename = "books"

    id: int = ormar.Integer(primary_key=True)
    name: str = ormar.String(max_length=256)
    description: Optional[str] = ormar.String(max_length=256, nullable=True)
    authors: Optional[list[Author]] = ormar.ManyToMany(
        Author, related_name='author_books', through=BookAuthor
    )
    co_authors: Optional[list[Author]] = ormar.ManyToMany(
        Author, related_name='co_author_books', through=BookCoAuthor
    )

Test to reproduce:

from random import randint

from faker import Faker
import pytest

from app.core.db import database
from app.models.library import Author, Book


@pytest.mark.asyncio
async def test_db():
    fake = Faker()
    async with database:
        for _ in range(6):
            await Author.objects.create(name=fake.name())

        book = await Book.objects.create(name=fake.sentence(nb_words=randint(1, 4)))
        for i in range(1, 3):
            await book.authors.add(await Author.objects.get(id=i))
        for i in range(3, 6):
            await book.co_authors.add(await Author.objects.get(id=i))


        print('')
        prefetch_result = await Book.objects.prefetch_related(['authors', 'co_authors']).all()
        import json
        prefetch_dict_result = [x.dict() for x in prefetch_result if x.id == 1][0]
        print(json.dumps(prefetch_dict_result, indent=4))

        select_result = await Book.objects.select_related(['authors', 'co_authors']).all()

        import json
        select_dict_result = [
            x.dict(
                exclude={'authors': {'bookauthor': ...}, 'co_authors': {'bookcoauthor': ...}}
            ) for x in select_result if x.id == 1
        ][0]
        print(json.dumps(select_dict_result, indent=4))
        assert prefetch_dict_result == select_dict_result

Expected behavior
I would expect that the resulting members of authors and co_authors would be the same as select_related for prefetch_related.

Screenshots
diff-view after running the above test:
image

Versions (please complete the following information):

  • Database backend used: postgres
  • Python version: 3.10
  • ormar version: 0.11.3
  • pydantic version: 1.10.2
  • if applicable fastapi version: 0.85.0
  • alembic: 1.8.1

Additional context
Add any other context about the problem here.

Hey @collerek , any thoughts? 😊