glecorve/irisa-text-normalizer

large memory usage

Opened this issue · 0 comments

Hello,

I'm using this tool to normalize a small subset of my data(600M out of 50G). I found the memory usage kept increasing. Ideally, it should take one line a time, transform it and output it. In this way, the memory usage should be stable and negligible.

I tried to replace 'foreach' by 'while' but the problem seems not to be solved. I never used Perl before so I wonder if there is a simple work around/fix to make this tool scalable to big data.

Thanks a lot,
Yuzhou