openresty/lua-resty-string

Can we have str.from_hex() function?

didip opened this issue · 8 comments

didip commented

pretty much the opposite of:

function _M.to_hex(s)
    local len = #s * 2
    local buf = ffi_new(str_type, len)
    C.ngx_hex_dump(buf, s, #s)
    return ffi_str(buf, len)
end

@didip Patches welcome.

@didip
That will be helpful.
However, in some circumstances you can use base64 instead......

@didip Yeah, from_hex is good to have. Will you contribute a patch? Thanks!

didip commented

@agentzh Work has been crazy recently, but yes, i have been slowly working on the patch.

Hello!

I've added function from_hex() to string.lua module.
But my original implementation was not stable.
Then I added other implementations and made little research.

So, I have 3 variants of from_hex(function):

  1. pure lua,
  2. FFI version based on strtol
  3. FFI version based on sscanf

And results:

  1. stable but slow (54.3s for 10M strings on my PC)
  2. fast but unstable (8.19s for 10M strings, ~20% invalid results)
  3. fast, less errors, but critical (core dumped, ~0.1%)

I added files to gist:
https://gist.github.com/realghost/1d99f6e80884831161713116dfe04d18

Please review it and test on your environment.
Are your results the same?
Please help me to find errors in FFI versions.

Thank you!

I've found error in strtol version:
I forgot to add byte for tmp buffer. Because strtol expects zero-terminating strings, it continue to scan string if find non-zero 3rd byte.

Also I added optimization: dst removed, results are written to src.
As for now we have performance: 6.9s on 10M strings.

Please review current strtol version and I've add it as PR.

For guys who is searching for a from_hex implementation, I have written one in my lua-resty-base-encoding library.

Unlike those methods listed above, I write the feature in C and provide a thin Lua binding.
Consider Lua doesn't have real buffer, and the interaction with C land is expensive, especially when JIT is unavailable, I believe this approach could be fast.

It takes 0.93s for 10M strings (each string is 100 char length) in my local benchmark.

Consider it resolved.