basho/riak-php-client

Multiget via MapReduce and Riak.mapValues? [JIRA: CLIENTS-369]

Closed this issue · 7 comments

I found this very helpful Gist https://gist.github.com/drewkerrigan/4218222 implementing multiget with a MapReduce task. My question is (as you can also see in the gist comment) whether the builtin JS function mapValues returns data in the order given in the keys array. If not so, is it possible that an array of arrays [ key_1 -> data_1, key_2 -> data_2, ...] be retrieved so that key-value pair information are intact?

Also, couldn't such a solution (if exists) be implemented directly in the PHP client in order to have out-of-the-box multiget support?

That will work in most cases, but I agree with you that it should be done in the client (or as a native server operation). The best way forward would be to add an operation that uses curl's 'multi' to perform the fetches.

Thanks for answering. So, I could pretty much implement this with CURL parallel threads. I noticed that the fetching part happens in Utils.php.

Would you be open to examine a pull request if I could manage to add the parallelism support?

Either I or someone from Basho KK (maybe @kuenishi ?) will review it, yes.

Will do.

@georgepsarakis Did you ever come up with a solution for this?

We are releasing a rewrite of the client supporting 2.x features within the next few days. If this is something that still is desired to be part of the client, reopen this issue and I will get it queued up.

@christophermancini No, never found the time to test this solution. I also found this PHP extension https://github.com/php-riak/php_riak which uses Protocol Buffers instead of the HTTP API, perhaps you would like to have a look at it as well? I haven't really understood if it supports parallel Get requests (in the documentation it refers to setting the number of persistent connections but this isn't necessary for parallelism I believe) and haven't tested it either, so it is just a suggestion :).

Thanks for the feedback.

@georgepsarakis Yea, I have checked out the php_riak lib. Despite the performance boost and decreased packet sizes offered by PB, we chose not to use it because there is not a solid PHP lib for it that does not depend on a PHP extension. Our goal was to have a lightweight client lib with minimal dependencies. PB is something we would like to offer in the future, but I think we would continue to ship as HTTP being the default, but via extra configuration you can use PB for those that want it.