Instance Generator for the Lass Ontology (http://streamreasoning.org/ontologies/lass.owl) based on the code of University Benchmark Articifial Data Generator (UBA) for LUBM benchmark.
> ./generate.sh options
Run the following to see the usage summary:
> ./generate.sh --help
There are a number of parameters that can be used to tune the performance of the generator. The best combination will depend on the hardware on which you are generating the data.
We strongly suggest using --threads
to set the number of threads, typically you should set this to twice the number of processor cores (assuming hyper-threading enabled). Using this option will give you substantially better performance than not using it.
Using consolidation will reduce the number of files generated though total IO will be roughly the same. With --consolidate Partial
you get a file per university (which can still be a lot of files at scale) while --consolidate Full
will produce a single file per-thread which provides the least number of files while still giving good parallel throughput.
The --compress
option trades processing power for substantially reduced IO. The reduced IO is invaluable at larger scales, for example with 1000 universities and --consolidate Full
the compressed N-Triples output file is 706 MB while the uncompressed output is 23 GB i.e. an approximately 32x compression ratio.
The value given for --format
controls the output data format and can have an effect on the amount of IO done and the performance.
TURTLE
is the most compact format but is most expensive to produce because the reduction to prefixed name form takes extra time. NTRIPLES
and OWL
are typically the fastest formats to produce.
The Semantic Web and Agent Technologies (SWAT) Lab, CSE Department, Lehigh University
Rob Vesse ##L1G code Riccardo Tommasini Politecnico of Milan
riccardo dot tommasini at polimit dot it
Yuanbo Guo yug2@lehigh.edu