Is there a limit on the number of triples?

Question

Is there a limit on the number of triples?

raadjoe opened this issue 6 years ago · 2 comments

I'm trying to query over a large HDT file (LOD-a-lot, 28 billion triples), where the indexes are already generated.

When I load the HDT file, it does not return any error and it is relatively fast:
document = HDTDocument(data.hdt)

However, it seems that not all triples are loaded. Specifically, I'm expecting the following command to return 28 billion triples, it is returning around 2.5 billion triples:
document.total_triples

I also verified from querying the data that indeed a number of triples were not loaded.

Answer 1 · 2019-04-02T15:13:10.000Z

Hi,
At the software level, there is no limit. However, be careful to have enough RAM to store all indexes (they are loaded in main memory). I don't know how HDT will behave if you do not have enough RAM to store them. For info, LOD-a-Lot requires 300Go of RAM to load the additional indexes (which is not mentioned on the website).

Answer 2 · 2019-05-04T09:56:03.000Z

A little update on that: I've added the option to disable indexing. Just pass False as the second arguments to the HDTDocument constructor.

doc = HDTDocument("/path/to/hdt_file", False)