Re-examine the behaviour of hex_ranges
seanhandley opened this issue · 10 comments
We're currently doing some manual manipulation of the data returned by hex_ranges
i.e. grouping them by distance. This is largely because other bindings (namely python and go) take this approach.
The C lib just returns an array without grouping the contents.
Before we release, we should agree what approach we wish to take.
Continuing the conversation in here:
@dfellis then, all C client code is repeating logic like
((1 + Math.sqrt(1 + 8 * (j / 6.0).ceil)) / 2).floor? That is what seems more suspicious to me, I'd be surprised if that was the case. On the other hand, the Go port looks simpler.
Unfortunately I am not familiar with H3, so can only reason in a formal way. Which is limited and have questions but can't anticipate answers :).
Your experience/feedback is going to be very helpful for us to help understand the rationale behind this implementation and move forward.
Yes, any C code wanting the sets of hexagons and their respective ring have to would do that (well, it doesn't have to do that last floor since the outer numeric operations can be integers instead).
The reasoning was as follows: anyone wanting to use C over another language certainly prioritizes performance much higher, so we're not forcing them to compute the ring level if they don't need it, also not spending the extra memory on it when it can be derived from the index number itself while iterating through the array.
It might make sense to provide the array index to ring level algorithm as a function to call, but we didn't because we also figured that the C users would want the computation inlined instead of through a function call, and guaranteeing that with a simple macro instead of relying on the C compiler to guess correctly that it can be inlined was what we figured a C developer would prefer. We also found that the C function if written and then used by the binding would slow down the operations here dramatically because of the FFI translation costs, and inlining it into the Python binding code was a significant advantage there, too.
Thanks for that excellent explanation @dfellis !
As a Ruby application developer, I'd expect the hexagons to be pre-grouped rather than having to calculate inner array boundaries manually.
I'm in favour of keeping the current implementation, though perhaps we should test it using a very large k value to ensure it's performant enough?
So I did a bit of cursory testing.
def self.hex_ranges2(h3_set, k)
max_out_size = h3_set.size * max_kring_size(k)
out = FFI::MemoryPointer.new(H3_INDEX, max_out_size)
pentagonal_distortion = false
FFI::MemoryPointer.new(H3_INDEX, h3_set.size) do |h3_set_ptr|
h3_set_ptr.write_array_of_ulong_long(h3_set)
pentagonal_distortion = hexRanges(h3_set_ptr, h3_set.size, k, out)
end
raise(ArgumentError, "One of the specified hexagon ranges contains a pentagon") if pentagonal_distortion
hexagons = out.read_array_of_ulong_long(max_out_size)
hexagons.reject(&:zero?)
end
with spec
context "when the number of hexes is very large" do
let(:h3_set) do
1000.times.map do |i|
offset = 0.0001 * i
H3.geo_to_h3([53.8809742, -2.2884528 + offset], 15)
end
end
let(:k) { 15 }
it "runs" do
hex_ranges
end
context "when the algorithm doesn't group indexes" do
subject(:hex_ranges) { H3.hex_ranges2(h3_set, k) }
it "runs" do
hex_ranges
end
end
end
The hex_ranges2
version runs in under 0.1 seconds on my machine consistently, and the version doing grouping takes around 0.3-0.4 seconds. Significantly slower, as expected.
During testing, I also uncovered this. The following spec fails:
context "the numbers match" do
let(:hex_ranges2) { H3.hex_ranges2(h3_set, k) }
it "has the same number of elements" do
expect(hex_ranges2.count).to eq(hex_ranges.values.flatten.count)
end
end
Like so
1) H3.hex_ranges when the number of hexes is very large the numbers match has the same number of elements
Failure/Error: expect(hex_ranges2.count).to eq(hex_ranges.values.flatten.count)
expected: 520562
got: 721000
(compared using ==)
# ./spec/h3_spec.rb:691:in `block (5 levels) in <top (required)>'
So something may be incorrect with the grouped implementation.
EDIT:
They don't even contain the same number of unique indexes 🤔
Failure/Error: expect(hex_ranges2.uniq.count).to eq(hex_ranges.values.flatten.uniq.count)
expected: 217255
got: 220638
I've checked the tests for the python bindings @dfellis and I notice they only test with a h3 index set of length 1. Does your implementation still behave as expected for sets with more than a single element?
I realised in the Ruby implementation we're not correctly reading the array because i * h3_set.size + j
doesn't get the correct index for cases where the h3_set.size is greater than 1.
EDIT:
I think this
hex_range_list[ring_index].add(
h3_to_string(krings[i * num_hexagons + j]))
should be
hex_range_list[ring_index].add(
h3_to_string(krings[i * max_kring_size + j]))
New implementation here https://github.com/StuartApp/h3_ruby/pull/25/files
I've checked the tests for the python bindings @dfellis and I notice they only test with a h3 index set of length 1. Does your implementation still behave as expected for sets with more than a single element?
I realised in the Ruby implementation we're not correctly reading the array because
i * h3_set.size + j
doesn't get the correct index for cases where the h3_set.size is greater than 1.EDIT:
I think this
hex_range_list[ring_index].add( h3_to_string(krings[i * num_hexagons + j]))should be
hex_range_list[ring_index].add( h3_to_string(krings[i * max_kring_size + j]))
I can confirm that's it's broken when I use multiple hexagons in the python bindings, too. This was likely my mistake, not Teng Fu's. Thank you very much for finding it. (I also discovered that the Python Sets refuse to work like Lists/Arrays and don't allow iteration, I guess because sets don't have a defined order? So I can't actually take the k-ring output, for instance, and plug it into hex_ranges.)
@seanhandley I've expanded the test suite in the Python code to multiple k-rings in both the input list and the output for each hexagon and it also caught another bug, so you may want to re-reference this PR as well.
@dfellis My pleasure - I'm glad the bug's resolved!
Regarding sets - I think for the Ruby implementation I'm going to use Ruby's uniq
method to remove duplicates.
With Python I believe there's an "ordered set" class that might be helpful.