SQL optimization: join UUID mapping on relation tuples when required

Question

SQL optimization: join UUID mapping on relation tuples when required

zepatrik opened this issue 2 years ago · 4 comments

zepatrik commented 2 years ago

Preflight checklist

I could not find a solution in the existing issues, docs, nor discussions.
I agree to follow this project's Code of Conduct.
I have read and am following this repository's Contribution Guidelines.
This issue affects my Ory Network project.
I have joined the Ory Community Slack.
I am signed up to the Ory Security Patch Newsletter.

Describe your problem

Currently we first query the tuples and then do a second query for mapping the UUIDs.

Describe your ideal solution

Depending on the API (list & expand) we should be able to apply the mapping on the database by joining the UUID mapping table onto the relation tuple table.

Workarounds or alternatives

Keep as is.

Version

master

Additional Context

No response

Answer 1 · 2023-01-20T19:05:36.000Z

I also thought about this. Continuously joining with the mapping table, especially if it has a different storage class in cockroach, will be very expensive.

Alternatively, we could add columns for the unmapped fields directly into the relation tuples table, and add another book field to tuples creation to identify a tuple as containing PII. If set, the premapped data will be written to the mapping table, as it is now. If unset, the unmapped data will be stored in the relation tuples table.

Unmapping means looking in the unmapped columns first, the mapping table second.

This way we have

fast lookups for non PII tuples
data optionally stored in a specific region for PII tuples, at the cost of performance

WDYT?

Answer 2 · 2023-01-24T11:43:17.000Z

Continuously joining with the mapping table, especially if it has a different storage class in cockroach, will be very expensive.

I suggest we measure this first before we go the more complex route you sketched 😉

Answer 3 · 2023-01-24T13:12:14.000Z

Agreed, we can simply run the queries against a multi-region CRDB.

Answer 4 · 2024-01-25T00:15:05.000Z

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas on how you could contribute towards resolving it;
leave a comment and describe in detail why this issue is critical for your use case;
open a new issue with updated details and a plan for resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️