googleapis/gax-ruby

Investigate Path template performance

geigerj opened this issue · 25 comments

Recent investigation by @blowmage found that GAPIC utility methods that rely on the GAX path template code are very slow, to the point that they are the bottleneck for some google-cloud-ruby acceptance tests. This is likely due to the use of rly to implement template parsing.

Investigate these performance concerns. If rly is the issue, addressable may be a good substitute, although be cautious of differences between Google's path templates and RFC 6570. Another approach might be to drop parsing and use basic string interpolation, which @pongad has already done in the Go GAPICs.

Python also has just string interpolation (with a touch of validation). It's likely fairly trivial to port the Python implementation to Ruby: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/api_core/google/api_core/path_template.py

Is there a way to detect if the full path template parsing is needed when the code is generated? For example, consider this method:

DATABASE_PATH_TEMPLATE = Google::Gax::PathTemplate.new(
  "projects/{project}/instances/{instance}/databases/{database}"
)

private_constant :DATABASE_PATH_TEMPLATE

# Returns a fully-qualified database resource name string.
# @param project [String]
# @param instance [String]
# @param database [String]
# @return [String]
def self.database_path project, instance, database
  DATABASE_PATH_TEMPLATE.render(
    :"project" => project,
    :"instance" => instance,
    :"database" => database
  )
end

There are no path template variables or value restrictions, so runtime performance would vastly improved if the following was generated instead:

# Returns a fully-qualified database resource name string.
# @param project [String]
# @param instance [String]
# @param database [String]
# @return [String]
def self.database_path project, instance, database
  "projects/#{project}/instances/#{instance}/databases/#{database}"
end

Is this approach possible?

Yes, this sounds like a better approach to me and I can't think of any cases where it wouldn't work.

On a related note, I am curious if you have gotten feedback on these path template helpers in general. I know @frankyn wasn't too enthusiastic about using them in sample code because they look strange, and other languages have dropped them entirely.

So, before we re-implement it might be worth considering if simply deprecating them is a better solution (i.e. let the client create the string directly). If we really do want to keep them then string interpolation seems like the way to go, but I actually prefer just using the string instead of a helper method. WDYT?

Python also dropped the use of a full-blown parser and its path template module is extremely small, simple, and well tested: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/api_core/google/api_core/path_template.py Noticed that I said this above, but the point still stands - if we're going to change how we do path templates in Ruby, I would strongly prefer that it adopts Python's approach.

I don't think we should deprecated them, I think we should provide them as helpers to construct strings in the correct format.

@jonparrott I'd bet you'd prefer that all things follow Python's approach though 😄

I think we'll have a complication either way here. These are currently implemented as class methods (IIRC this was why @frankyn didn't like them), which I do think looks weird and often verbose. If we made them available as instance methods it would look exactly like Python, but we'd probably have to keep both around for awhile and it does introduce the (remote) possibility for things like name collisions.

I just have a hard time believing that folks really needs these things (i.e. in the pub sub docs only 3/7 languages use them). @blowmage do you think we'll get complaints if we don't have these, or have we gotten issues/complaints related to this in the past?

Let's also check with our fellow Python friend @lukesneeringer - he had a long conversation about this the other day and I'd be curious what he'd suggest we do.

@jonparrott I'd bet you'd prefer that all things follow Python's approach though 😄

Bias aside, the Python implementation was well-thought out and intentionally intended to provide a basis for other languages and was reviewed by several others (including @lukesneeringer). This actually goes for all of api core.

I just have a hard time believing that folks really needs these things (i.e. in the pub sub docs only 3/7 languages use them)

I agree. I don't think anyone needs them, but they are a bit useful to have around.

These are currently implemented as class methods (IIRC this was why @frankyn didn't like them), which I do think looks weird and often verbose. If we made them available as instance methods it would look exactly like Python, but we'd probably have to keep both around for awhile and it does introduce the (remote) possibility for things like name collisions.

I have no strong feelings one way or the other here.

do you think we'll get complaints if we don't have these, or have we gotten issues/complaints related to this in the past?

It depends on what you mean by "these". Do you mean the _path methods on the GAPIC client classes? Or the Google::Gax::PathTemplate class?

Both.

I think we have to do something about PathTemplate if it's so slow. Given that, I'm suggesting we deprecate and stop using the _path helpers as others have done rather than re-implement them.

It seems like something that's better solved by documentation rather than code, which has to be documented and explained in addition. Just to use a concrete example, what do you think of this?

https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-logging/lib/google/cloud/logging/v2/logging_service_v2_client.rb#L280

To me, dropping the helpers here would be an improvement and a useful simplification. However, I don't want to drop them if you're really sensing that folks need the helpers. Or, maybe there are more complex cases where they are genuinely useful?

If we have any GA clients using Path templates then we can not deprecate or remove them.

I'm certainly in favor of just changing the implemention. The Python one took less than a day.

I'm sure we have GA clients that use them.

The time to change is not an issue. To recap, there are two potential problems here:

  • poor performance
  • currently implemented as class methods

I'm still hung up on the second item and I've pinged @frankyn to weigh in. This issue is causing us to intentionally not use the helper methods in favor of strings in sample code. If we only consider the first issue then it's pretty simple and we could just change the internal implementation, but I'd like to find a way to solve both problems if we can. Duplicating the helpers methods is an obvious solution, but I'm not sure it's a good one.

FWIW, my thinking on helper methods in Ruby is that they are there as a convenience to the programmer. If the method is not in the most convenient place, then add it. Convenience is more important than duplication.

DATABASE_PATH_TEMPLATE = Google::Gax::PathTemplate.new(
  "projects/{project}/instances/{instance}/databases/{database}"
)

private_constant :DATABASE_PATH_TEMPLATE

# Returns a fully-qualified database resource name string.
# @param project [String]
# @param instance [String]
# @param database [String]
# @return [String]
def self.database_path project, instance, database
  DATABASE_PATH_TEMPLATE.render(
    :"project" => project,
    :"instance" => instance,
    :"database" => database
  )
end

# Returns a fully-qualified database resource name string.
# @param project [String]
# @param instance [String]
# @param database [String]
# @return [String]
def database_path project, instance, database
  self.class.database_path project, instance, database
end

Thanks @blowmage that's great input.

Let's wait to confirm w/ Frank that this will unblock the sample code issue. If that works out and you're okay with the duplication then I'm happy to push this through as outlined above.

@jbolinger As you said, there are really two issues. I believe they are orthogonal and can be solved independently. Therefore I propose we move the helper method convenience discussion to #125.

This afternoon @quartzmo and I paired on a spike of implementing the python code, to see if it was a viable option. Sadly, the approach in the python code cannot 100% replace the functionality provided by Google::Gax::PathTemplate. With this in mind, we have come to the conclusion that there are 3 major options to take:

  1. Replace Google::Gax::PathTemplate with a new class - this should deliver a boost in performance, but will also include introducing API incompatibility. Specifically, Google::Gax::PathTemplate#match doesn't seem to be easily provided by the regexp approach.
  2. Leave Google::Gax::PathTemplate alone, and simply stop using it for the simple templates used by most of the path helper methods. All of the path templates we have checked so far in the generated GAPIC code have used named single segments. It should be possible to use the simple ruby string interpolation proposed in an earlier comment in these cases. Any future helper method that used a template with multi segment patterns could use the slower Google::Gax::PathTemplate.
  3. Leave Google::Gax::PathTemplate alone, introduce a separate class based on the python regexp code. This would maintain backwards compatibility of the GA's google-gax library, while allowing helper methods to switch to using a different class with increased performance.

Responding to second item in #125.

As for performance concerns, removing the underlying implementation for the Python version sounds acceptable, but we may still want to verify performance.

@blowmage Could you explain what the problem was with match? IIUC, path template is equivalent to "named glob pattern" which should strictly be less powerful than regexp?

The python code implements a simple validation of a path string to a template. The Google::Gax::PathTemplate#match method validates a path string to the template, and returns the parameters taken from the path that look like they could be sent to Google::Gax::PathTemplate#render to generate the path again. I'm not sure the regexp approach will enable this behavior.

I say "look like they could be sent to", since we also noticed some unexpected behavior in #match, which is detailed in #126.

I find the behavior of Google::Gax::PathTemplate#match to be surprising in other ways as well. If the path doesn't match the method raises an error. I expected it would behave similar to Regexp#match, which will return nil when it doesn't match instead of raising.

I might be thinking about this too naively, but if you have a/{b}/c/{d}, isn't it just regexp a/[^/]*/c/[^/]*? You could use capture groups to take out the params right? I also resonded to #126.

I suspect that one of the reasons this gem has a dependency on the rly gem is because it needs a full lexer to do lookahead. I mean, I don't know that for sure, but that is what I suspect.

Can you use capture groups for a template like v1/*/{name=parent/**}?

I would be cool dropping features, ala option 1, but the gem is already GA, so API compatibility would be a consideration IMO. But you could mark things deprecated and remove the functionality and I bet nobody would care.

I can't comment on how disruptive things would be if we drop the feature, but the capture group for that template should be like v1/([^/]*)/(parent/.*) right?

I don't think any of these template actually needs full-featured LALR parsers. I think lex/yacc was used to parse the path template (v1/*/{name=parent/**}, etc) themselves, not the strings that our generated functions are supposed to match. I think it's possible to turn the path template into a regexp while generating the client, though I haven't thought much about it.

My understanding is this issue is still in progress.
(Replying to bring the issue back into compliance.)

@jbolinger Any updates?

I'm going to close this. The plan is to rewrite path template handling so there is no need to examine the performance any more than we already have.