Bigstring on rosetta package
Opened this issue · 1 comments
Currently, rosetta
works on Bytes.t
. A translation from an encoding to UTF-8, we choose this kind of buffer mostly because uutf
works on Bytes.t
. However, angstrom
works on bigstring
and, in other side, fe
(internal encoder of mrmime
) works with both.
So, because rosetta
is under my responsibility, I can decide to provide a translation from a bigstring
input. But the code will change a lot - and internals stuffs will change.
From my point of view and mostly because I did lot of benchmarks with buffet
, we get the same and big question: should enforce to use Bytes.t
or Bigstring.t
or functorize it or use an (G)ADT about the input? From benchmarks, functor is the best (and flambda
) will be able to optimize it easily - specialization of the functor.
So we have different plans:
- (middle) functorize
rosetta
(andpecu
, anduuuu
, andcoin
, andyuscii
).
This solution move the boilerplate on rosetta
, then we can do application of functor
in mrmime
and use only bigstring
. From this point, we avoid most of copies when we translate an input from an encoding to UTF-8. However, we continue to have copies to the uutf
part (which uses only Bytes.t
).
From benchmarks, it's the best solution even if I don't like to functorize all things. flambda
then will be able to optimize it and readability of code is kept instead the second solution which need to put a witness to any functions which manipulates input.
- (middle) (G)ADT - (
decompress
's solution)
Avoid the functor but put an argument, the witness in any functions which manipulates input. flambda
is not really able to optimize it and specialization (even if we use GADT) is hard.
- move to
bigstring
(angstrom
's solution)
According to angstrom
which use a bigstring
, we can move to this solution and enforce to use only bigstring
on rosetta
(and so on packages). However, we lost the capabilities to use Bytes.t
in some cases. But in performance perspective, this is the best choice.
In my opinion, the first case should be the best but ... eh an other functor and after my story with ocaml-git
I'm little bit sick with it. Bref, I let this issue because the question stills open.