jonra1993/fastapi-alembic-sqlmodel-async

Ability to omit table joining if relations are not needed by API consumer

bazylhorsey opened this issue · 10 comments

After turning on SQL echo, it seems that there can be quite a bit more querying than needs to necessarily happen.
Queries can be much faster if related tables are not unwrapped.

Example body param:

    expand_fields: Optional[bool] = Query(
        default=False, description="Expand relational fields"
    ),

It would be EVEN BETTER if you could only select from one of the related fields. This could easily be implemented with an additional Enum representing the base table.
Example body param:

    expand_fields: Optional[List[Enum[{SOMETHING}]] = Query(
        default={SOMETHING}, description="Expand relational fields"
    ),

In cases where the expand_fields list is empty, it should not show the unwrapped location, and more importantly not perform the selectin.

This also seems an appropriate time to ask why selectin was chosen as the lazy loading type?
Is there a way to change to a join at query time, this may be a good example if we can find an good spot for it.

I think cancel the relationship between all tables may be more flexible, and let the program itself handle the relationship.

By program you mean the client consuming the API? What if a person needed in some cases to see the hero with their team, and in other cases just needed hero, and the spending time on the join was useless?

Hello @bazylhorsey the selectin was used on relationships due to the async implementation of sqlmodel I was facing this issue fastapi/sqlmodel#74. I implemented selectin on the Relationship otherwise it should be done on the query. I agree that queries can be optimized so some joining can be omitted. What do you mean by "change to a join at query time"?

Do you want those body params to allow users to decide if they want to have the relationship response?

A sample with a custom query like this?

async def read_users_list_by_role_name(
    status: Optional[IUserStatus] = Query(
        default=IUserStatus.active,
        description="User status, It is optional. Default is active",
    ),
    role_name: str = Query(
        default="", description="String compare with name or last name"
    ),
    params: Params = Depends(),
    current_user: User = Depends(
        deps.get_current_user(required_roles=[IRoleEnum.admin])
    ),
):
    """
    Retrieve users by role name and status. Requires admin role
    """
    user_status = True if status == IUserStatus.active else False
    query = (
        select(User)
        .join(Role, User.role_id == Role.id)
        .where(and_(Role.name == role_name, User.is_active == user_status))
        .order_by(User.first_name)
    )
    users = await crud.user.get_multi_paginated(query=query, params=params)
    return create_response(data=users)

Hello @bazylhorsey lazy can also be joined, subquery, or selectin as described here https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html

I have tested elapsed time with different lazy techniques but the time is similar, a couple of us

import timeit

    elapsed_time = timeit.repeat(lambda: (await crud.hero.get_multi_paginated(params=params) for _ in '_').__anext__(), number=10000, repeat=5)
    for index, exec_time in enumerate(elapsed_time, 1):
        m_secs = round(exec_time * 10 ** 2, 2)
        print(f"Case {index}: Time: {m_secs}µs")
    
    print(f"Mean: {round(max(elapsed_time)* 10 ** 6, 2)}")

https://stackoverflow.com/questions/31237042/whats-the-difference-between-select-related-and-prefetch-related-in-django-orm

Finally found a general guideline for loading techniques! 🥂
So it looks like M2M relations and reverse foreign keys (in your example anywhere you're doing team.heroes in crud) are best for using selectin. Anywhere you have one-to-one relations or extending a model with the foreign key handy joined is a better choice.

My understanding django -> fastapi:

  • prefetch_related == selectin
  • select_related == joined

A colleague of mine had a solution to your bug. In your crud base you can use response.unique().scalar_one_or_none()
I am not sure if this could have undesirable results, as I know joins are an entirely different strategy than selectin and could have different output.

Hello @bazylhorsey thanks for this reference I am going to try "response.unique().scalar_one_or_none()" and see what is its output

I'd go for using joined in all your many-to-ones (side with foreign key), and one-to-one (both sides).
Local performance is much better as compute goes to psql. You can utilize useList: False in the relationship attributes, to enforce joins don't create duplicate rows. I'm skeptical of unique(), because my intuition is this actually allows sql to make its own PK so it can avoid key errors during joining. This might create malformed returns, so in that case I think selectin is a superior choice in the places it caused the bug referenced. This is my reasoning for doing joined everywhere except the places that return a list with its relationship (ie team.heroes).

Hello @bazylhorsey can you please share your sample code you were testing?

@bazylhorsey Based on loading relationships and the insight of your link. One-One and One-Many relationships have been updated to use sa_relationship_kwargs={"lazy": "joined"} and Many-Many relationships use sa_relationship_kwargs={"lazy": "selectin"}

8470af9