cpp-netlib/uri

starts_with seems inefficient

Opened this issue · 1 comments

mrstux commented

I was reviewing the remove_dot_segments implementation and I noticed starts_with is implemented as:

inline bool starts_with(std::string const &str, const char *range) {
  return str.find(range) == 0;
}

Surely this is a very inefficient way to determine if a string starts with another string, as potentially it will search a very large haystack to find a needle at the very end, or not at all, when all it cares about is if the needle is at the start.

And the remove_dot_segments function calls this 5 times per segment of each uri.

perhaps the following alternative would be better, as you do know the size of each prefix upfront.

inline bool starts_with(std::string const &str, const char *prefix, size_t prefix_len) {
  return str.compare(0, prefix_len, prefix, prefix_len) == 0;
}

Of course, its better to just use a string_view, which could capture the literal string size at the callsite.

constexpr bool starts_with(std::string_view str, std::string_view prefix) noexcept {
	if (str.size() < prefix.size())
		return false;
	
	return std::string_view::traits_type::compare(str.data(), prefix.data(), prefix.size()) == 0;
}

return str.find(range) == 0;

glynos commented

Thanks for the review. I think the approach to use the string_view is the best of the options you provide.

I am travelling and won't have the opportunity to address this until January. I would be able to review a PR if you can prepare one.