Ironholds/pystr

Vectorisation and compilation

Ironholds opened this issue · 5 comments

Heyo,

So, this is an AWESOME package to have. What would you think of using a compiled backend? That way it would be really trivial to have these functions vectorised, rather than primarily handle single strings, and it'd probably speed things up too.

I worked on an Rcpp module for a while that did things like alphanumeric comparisons and so I've got a lot of the code just sitting around. If it's of interest I'd love to integrate it.

I haven't written a single line of C++ but I love learning new things! Can you give me an example of what this would look like? How is it different than doing something like:

strings = c("hello", "world")
sapply(strings, function(s) pystr_capitalize(s))

#   hello   world 
# "Hello" "World" 

Function calls in R are hella-expensive (I mean, they don't feel that way, but it adds up). I'm happy to write the compiled backend bits, although C++ is very pythonic in its string manipulation, so I'll throw together an example as a pull request once I'm out of today's meeting :). Hopefully that'll be explanatory enough to work from! Like I said, super happy to port things over meself, but more hands are always welcome.

Okay, so in the case of capitalise it actually is vectorised because all the underlying functions are. Despite this, switching to C++ still produces a pretty big speed improvement. Writing up a pull request now with benchmarks and timings :)

Just merged in your pull request. I think we should definitely implement this for the rest of the functions. I'll take a stab at a few of them as well, if you don't mind!

Totally! Let me know if you need any help; as someone who learned R as their first lang and then C++ I found the transition alternatingly glorious and frustrating. Happy to assist in the frustrating bits :)