rosetta-rs/string-rosetta-rs

Testing string implementations using real-world code

xfbs opened this issue · 2 comments

xfbs commented

Hey!

I'm the author of imstr. I just stumbled across this work and I think it is awesome to get some data on the performance of various string implementations. The more data we have, the better we can choose which approach works best.

One of the reasons I wrote imstr was that I had a specific use-case in mind: parsing data. Specifically, parsing massive XML files, and pulling content into structs in a way that does not need to copy strings around all the time. Basically, the same thing Bytes can do (atomic, reference-counted, slicable buffers).

Which brings me to my question: do you think it would be possible to take some representative real-world code, make it generic so it works with any string implementation, and testing it here? There could be some interesting data lurking. Especially in places like frontend code it is nice to be able to be conservative with memory and copying stuff around.

Since I am quite interested in this, I would be available to help out.

epage commented

My main concern with "real world" code is what use case should we represent?

One benefit to doing it though is it'd be more indicative for having binary size / compile time benchmarks which we currently lack.

xfbs commented

That is a good question, indeed.

My initial idea would be to pick say the top n binary crates from crates.io, and see if any of those can be modified to be generic over the string implementation. But I don't imagine that to be too simple :D