ldbc/ldbc_snb_docs

Inquiry: Non-LDBC attributes in the data model

aMahanna opened this issue · 4 comments

Hi there, a small inquiry;

What is LDBC's policy on having custom node/edge attributes added to the LDBC data model?

For instance, adding ParentPostId: ID to the Comment model.

I have been assuming for a long time now that the answer is no (for obvious reasons), just want to get confirmation.

Feel free to redirect me to https://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf if there is a section that specifies this information.

Closing as no longer needed; new questions will be addressed in a separate issue.

Hi @aMahanna -- I wanted to reply to this today but got distracted. The answer is: yes, they are allowed and they count as a special case of precomputation.

The specification states that precomputation, including storing such attributes, is allowed, if they are always in a consistent state with the rest of the database:

https://arxiv.org/pdf/2001.02299.pdf#page=105

Precomputation of query results (both interim and end results) is allowed. However, systems must ensure that precomputed results (e.g. materialized views) are kept consistent upon updates.

Hi @szarnyasg

I would be curious if you could give an example of what is considered an acceptable precomputed attribute. I also would have assumed the answer to this was no so I am curious about it.

For example, could I precompute the tags associated with a post and store them in the post document?

Hi @cw00dw0rd I'd rule that to be legal as well. For systems that support nested data (e.g. Spark), it's possible to store the tags just as a nested structure (e.g. array), as a junction table, or both (which is redundant and is thus treated as a special case of precomputation).

One risk I can see with this decision is the scoring of the Interactive workload. There, the load time is reported but is not used for determining the final score (throughput, measured in operations per second). So this could be abused by shifting a lot of the complexity to the load phase. Currently, our only defense against this is that the auditor may deem such a setup to be unrealistic. But, of course, this is a subjective term.

(In the BI workload, the load time is used in the final scores, so doing extensive precomputations will affect the final score negatively.)