jarulraj/sqlcheck

Poor explanation for “Metadata Tribbles”

vfaronov opened this issue · 4 comments

$ ./sqlcheck -v
-------------------------------------------------
> RISK LEVEL    :: ALL ANTI-PATTERNS
-------------------------------------------------
==================== Results ===================
create table employee_1 as select foo, bar, baz from employee;

-------------------------------------------------
SQL Statement: create table employee_1 as select foo, bar, baz from employee;
(MEDIUM RISK) (LOGICAL_DATABASE_DESIGN ANTI-PATTERN) Metadata Tribbles
● Store each value with the same meaning in a single column:
Creating multiple columns in a table indicates that you are trying to store a
multivalued attribute. This design makes it hard to add or remove values, to
ensure the uniqueness of values, and handling growing sets of values. The best
solution is to create a dependent table with one column for the multivalue
attribute. Store the multiple values in multiple rows instead of multiple
columns. Also, define a foreign key in the dependent table to associate the
values to its parent row.

● Breaking down a table or column by year:
You might be trying to split a single column into multiple columns, using column
names based on distinct values in another attribute. Each year, you will need to
add one more column or table. You are mixing metadata with data. You will now
need to make sure that the primary key values are unique across all the split
columns or tables. The solution is to use a feature called sharding or
horizontal partitioning. (PARTITION BY HASH ( YEAR(...) ). With this feature,
you can gain the benefits of splitting a large table without the drawbacks.
Partitioning is not defined in the SQL standard, so each brand of database
implements it in their own nonstandard way. Another remedy for metadata tribbles
is to create a dependent table. Instead of one row per entity with multiple
columns for each year, use multiple rows. Don't let data spawn metadata.
[Matching Expression: employee_1 ]

All this text doesn’t really explain what “Metadata Tribbles” is. I had to google it up, but there’s not a lot there, either. I can gather it but I’d suggest adding a clearer explanation.

Hah :) Do you have any other suggestions for naming this check?

BTW, the name comes from this StackOverflow post.

I don’t think the problem is with the name. I think the problem is with the explanation (or rather, lack thereof).

I see. Any suggestions for improving the explanation? A PR would be even better.

Updated the explanation in 4d2b389. Hope that it clarifies the anti-pattern better. Feel free to re-open if more details should be added.