Inquiring about JSONB column issues
naps62 opened this issue · 2 comments
Hi there
I noticed you have this rule as part of your PostgreSQL guides:
Avoid JSONB columns unless you have a strong reason to store an entire JSON document from an external source.
This is not an issue, but just me trying to understand the reasoning behind this. Is there an underlying issue that makes JSONB columns a bad idea? (maybe a performance issue for the table?)
I usually use these for exactly the case you mentioned, storing data from external sources, but I'm much more relaxed about it. Usually I'll store anything that is either impossible or hard to retrieve again later on, because I rarely know upfront what data I'll be needing
Hey @naps62,
This rule predates me but I can share some thoughts around the ideas behind it.
Performance (which you touched on) can certainly be an issue when querying large JSONB columns, and indexes on JSONB columns are also more expensive to generate are take up more space. Heap Analytics has a good article on the performance tradeoffs.
But the spirit of this rule is more about encouraging thoughtful data modeling. Storing data in a JSONB column results in a data model that is less well-defined and not as strongly typed. If any identifiers to other models or resources exist in that data, we lose referential integrity.
I'm much more relaxed about it. Usually I'll store anything that is either impossible or hard to retrieve again later on, because I rarely know upfront what data I'll be needing
It sounds like in some of the cases you're describing, you have a "strong reason" to store the entire response from an external source. That might be fine! In cases like that, I usually ask myself the following questions:
- Should my database be the source of truth for this data?
- Is the data really impossible to fetch again?
- Am I storing the data just in case I need it or is it providing value?
Not knowing what data you'll need or when/where you'll need it may indicate that you haven't fully defined the problem you're trying to solve, or haven't fully defined your data model. It could also mean you're storing data just in case, which can get expensive.
I believe this rule also makes the distinction "store an entire JSON document from an external source" to differentiate between the situation where you are storing (for example) an API response, and the situation where you know that a Customer
may have many dietary restrictions and you choose to store them as a JSONB column in the form {"vegan": true, "gluten_intolerant": false}
just in case you add more classifications in the future.
In the latter case, there are better ways to store that data in a relational database.
I hope these thoughts were helpful 🙂
@WilHall thank you for the response. I definitely helped yes 🙏
It does sound like we're on the same page. Perhaps saying "I'm much more relaxed about it" was a bit exaggerated on my part. Most of the use cases I was thinking about fit under your rationale as well. I was mostly afraid there was some obscure drawback of using JSONB columns that I should be aware of, other than the more obvious ones.
That article does provide some great insight though. Thanks for sharing!