seanhandley/h3_ruby

Re-examine the behaviour of hex_ranges

seanhandley opened this issue · 10 comments

We're currently doing some manual manipulation of the data returned by hex_ranges i.e. grouping them by distance. This is largely because other bindings (namely python and go) take this approach.

The C lib just returns an array without grouping the contents.

Before we release, we should agree what approach we wish to take.

Continuing the conversation in here:

@dfellis then, all C client code is repeating logic like

((1 + Math.sqrt(1 + 8 * (j / 6.0).ceil)) / 2).floor

? That is what seems more suspicious to me, I'd be surprised if that was the case. On the other hand, the Go port looks simpler.

Unfortunately I am not familiar with H3, so can only reason in a formal way. Which is limited and have questions but can't anticipate answers :).

Your experience/feedback is going to be very helpful for us to help understand the rationale behind this implementation and move forward.

Yes, any C code wanting the sets of hexagons and their respective ring have to would do that (well, it doesn't have to do that last floor since the outer numeric operations can be integers instead).

The reasoning was as follows: anyone wanting to use C over another language certainly prioritizes performance much higher, so we're not forcing them to compute the ring level if they don't need it, also not spending the extra memory on it when it can be derived from the index number itself while iterating through the array.

It might make sense to provide the array index to ring level algorithm as a function to call, but we didn't because we also figured that the C users would want the computation inlined instead of through a function call, and guaranteeing that with a simple macro instead of relying on the C compiler to guess correctly that it can be inlined was what we figured a C developer would prefer. We also found that the C function if written and then used by the binding would slow down the operations here dramatically because of the FFI translation costs, and inlining it into the Python binding code was a significant advantage there, too.

Thanks for that excellent explanation @dfellis !

As a Ruby application developer, I'd expect the hexagons to be pre-grouped rather than having to calculate inner array boundaries manually.

I'm in favour of keeping the current implementation, though perhaps we should test it using a very large k value to ensure it's performant enough?

So I did a bit of cursory testing.

  def self.hex_ranges2(h3_set, k)
    max_out_size = h3_set.size * max_kring_size(k)
    out = FFI::MemoryPointer.new(H3_INDEX, max_out_size)
    pentagonal_distortion = false
    FFI::MemoryPointer.new(H3_INDEX, h3_set.size) do |h3_set_ptr|
      h3_set_ptr.write_array_of_ulong_long(h3_set)
      pentagonal_distortion = hexRanges(h3_set_ptr, h3_set.size, k, out)
    end
    raise(ArgumentError, "One of the specified hexagon ranges contains a pentagon") if pentagonal_distortion

    hexagons = out.read_array_of_ulong_long(max_out_size)
    hexagons.reject(&:zero?)
  end

with spec

    context "when the number of hexes is very large" do
      let(:h3_set) do
        1000.times.map do |i|
          offset = 0.0001 * i
          H3.geo_to_h3([53.8809742, -2.2884528 + offset], 15)
        end
      end
      let(:k) { 15 }

      it "runs" do
        hex_ranges
      end

      context "when the algorithm doesn't group indexes" do
        subject(:hex_ranges) { H3.hex_ranges2(h3_set, k) }

        it "runs" do
          hex_ranges
        end
      end
    end

The hex_ranges2 version runs in under 0.1 seconds on my machine consistently, and the version doing grouping takes around 0.3-0.4 seconds. Significantly slower, as expected.

During testing, I also uncovered this. The following spec fails:

      context "the numbers match" do
        let(:hex_ranges2) { H3.hex_ranges2(h3_set, k) }

        it "has the same number of elements" do
          expect(hex_ranges2.count).to eq(hex_ranges.values.flatten.count)
        end
      end

Like so

  1) H3.hex_ranges when the number of hexes is very large the numbers match has the same number of elements
     Failure/Error: expect(hex_ranges2.count).to eq(hex_ranges.values.flatten.count)

       expected: 520562
            got: 721000

       (compared using ==)
     # ./spec/h3_spec.rb:691:in `block (5 levels) in <top (required)>'

So something may be incorrect with the grouped implementation.

EDIT:

They don't even contain the same number of unique indexes 🤔

     Failure/Error: expect(hex_ranges2.uniq.count).to eq(hex_ranges.values.flatten.uniq.count)

       expected: 217255
            got: 220638

I've checked the tests for the python bindings @dfellis and I notice they only test with a h3 index set of length 1. Does your implementation still behave as expected for sets with more than a single element?

I realised in the Ruby implementation we're not correctly reading the array because i * h3_set.size + j doesn't get the correct index for cases where the h3_set.size is greater than 1.

EDIT:

I think this

hex_range_list[ring_index].add(
                h3_to_string(krings[i * num_hexagons + j]))

should be

hex_range_list[ring_index].add(
                h3_to_string(krings[i * max_kring_size + j]))

I've checked the tests for the python bindings @dfellis and I notice they only test with a h3 index set of length 1. Does your implementation still behave as expected for sets with more than a single element?

I realised in the Ruby implementation we're not correctly reading the array because i * h3_set.size + j doesn't get the correct index for cases where the h3_set.size is greater than 1.

EDIT:

I think this

hex_range_list[ring_index].add(
                h3_to_string(krings[i * num_hexagons + j]))

should be

hex_range_list[ring_index].add(
                h3_to_string(krings[i * max_kring_size + j]))

I can confirm that's it's broken when I use multiple hexagons in the python bindings, too. This was likely my mistake, not Teng Fu's. Thank you very much for finding it. (I also discovered that the Python Sets refuse to work like Lists/Arrays and don't allow iteration, I guess because sets don't have a defined order? So I can't actually take the k-ring output, for instance, and plug it into hex_ranges.)

@seanhandley I've expanded the test suite in the Python code to multiple k-rings in both the input list and the output for each hexagon and it also caught another bug, so you may want to re-reference this PR as well.

@dfellis My pleasure - I'm glad the bug's resolved!

Regarding sets - I think for the Ruby implementation I'm going to use Ruby's uniq method to remove duplicates.

With Python I believe there's an "ordered set" class that might be helpful.