TS Optimization: avoid allocations and array slicing in byte pair encoding
connor4312 opened this issue · 0 comments
connor4312 commented
Currently bytePairEncode
creates an array of arrays byteIndicesAndRanks
which is spliced and removed as data is deleted.
Instead, it may be faster to use and reuse two typed arrays: one for indicies in byteIndicesAndRanks
and one for the byteIndicesAndRanks
themselves Splicing an item from the list would instead become calling indicies.set(indicies.subarray(index + 1), index)
(or perhaps a manual shift would be faster).