zouharvi/tokenization-scorer

Memory efficiency

zouharvi opened this issue · 1 comments

On data of ~10G, the memory consumption can spike to 150G during the computation of the unigram distribution and Rényi efficiency.

There might be some objects that are not needed anymore and can thus be trashed.

Resolved thanks to #4 by @mcognetta.