Halyard benchmarking -- how to improve?

Question

Halyard benchmarking -- how to improve?

Opened this issue 7 years ago · 5 comments

I have performed Halyard benchmarking on 1 node setup (i7-3770 3.4GHz, 32GB RAM, normal HDD) --> HDFS + YARN + HBase + Halyard. The querying was done via rdf4j-server SPARQL endpoint. e.g.:

wget -O - "http://halyard/rdf4j-server/repositories/benchmark50?query=select%20%2A%20%7B%3Fs%20%3Fp%20%3Fo%7D%20limit%2010"

I have used FEASIBLE [1] benchmark queries and IGUANA [2]. The configuration for the benchmarking is available in halyard docker repository [3] (iguana-config.tar.bz2).
As you can see from the benchmarking results for the smallest size Halyard could answer only 6 queries, for larger sizes (50 and 100) Halyard answered 0 queries.

From preliminary discussions: it is possible to query Halyard using Java interface and it should improve the performance. Is there any example on how to do that?

[1] http://aksw.org/Projects/FEASIBLE.html
[2] http://aksw.org/Projects/IGUANA.html
[3] https://github.com/earthquakesan/docker-halyard

Answer 1 · 2017-10-23T10:45:19.000Z

upd:

did not add benchmarking results to the github, they are here: https://www.dropbox.com/s/st5sz0hu7eoxj8l/benchmark_results.tar.bz2?dl=0

Answer 2 · 2017-10-24T13:58:05.000Z

I'll take a look at it, there might be many configuration reasons why HBase does not perform well on a single-node cluster. And there might be also reason in Halyard query evaluation and the benchmarking queries.

Answer 3 · 2018-01-08T11:39:08.000Z

I have found better performance when the 'Push' option is not enabled. There are probably issues with some queries (such as path queries) with that option. Have you tested without the Push option enabled?

Answer 4 · 2018-01-08T16:25:23.000Z

Hi,I'm sorry for the late response.I've noticed our mail system filtered your mail with the data attached. Could you, please, send it to this my Gmail address. I'll profile and compare both options. Thanks,Adam BTW there might be always a bug in the push strategy, however significant performance degradation for paths is theoretically possible due to its multi-threaded architecture only on systems very low on resources (1 CPU core, low memory, etc...). -------- Původní zpráva --------Od: Lawrence <notifications@github.com> Datum: 08.01.18 12:39 (GMT+01:00) Komu: Merck/Halyard <Halyard@noreply.github.com> Cc: Adam Sotona <adam.sotona@gmail.com>, Mention <mention@noreply.github.com> Předmět: Re: [Merck/Halyard] Halyard benchmarking -- how to improve? (#32) I have found better performance when the 'Push' option is not enabled. There are probably issues with some queries (such as path queries) with that option. Have you tested without the Push option enabled? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Merck/Halyard","title":"Merck/Halyard","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/Merck/Halyard"}},"updates":{"snippets":[{"icon":"PERSON","message":"@peterjohnlawrence in #32: I have found better performance when the 'Push' option is ***not*** enabled. There are probably issues with some queries (such as path queries) with that option. Have you tested without the Push option enabled?"}],"action":{"name":"View Issue","url":"#32 (comment)"}}}

Answer 5 · 2018-01-22T16:00:59.000Z

@earthquakesan Hi Ivan, have you made it to work in a multi node cluster as well? Thanks