Is there a limit on the number of triples?
raadjoe opened this issue · 2 comments
I'm trying to query over a large HDT file (LOD-a-lot, 28 billion triples), where the indexes are already generated.
When I load the HDT file, it does not return any error and it is relatively fast:
document = HDTDocument(data.hdt)
However, it seems that not all triples are loaded. Specifically, I'm expecting the following command to return 28 billion triples, it is returning around 2.5 billion triples:
document.total_triples
I also verified from querying the data that indeed a number of triples were not loaded.
Hi,
At the software level, there is no limit. However, be careful to have enough RAM to store all indexes (they are loaded in main memory). I don't know how HDT will behave if you do not have enough RAM to store them. For info, LOD-a-Lot requires 300Go of RAM to load the additional indexes (which is not mentioned on the website).
A little update on that: I've added the option to disable indexing. Just pass False
as the second arguments to the HDTDocument
constructor.
doc = HDTDocument("/path/to/hdt_file", False)