UniversalDependencies/tools

Limitation on the size of the text file for the conllu to text script

Closed this issue · 2 comments

I tried to use the conllu_to_text.pl on a sizeable (1gb) conllu file, however it seems that the script has some limitations.

By doing perl conllu_to_text.pl <C:\Folder/Blabla.conllu> C:\Folder\blabla.txt

It seems the system is only able to create 70+ mb txt, I've trying other solutions on Windows. but it seems to be a limitation of the script.

Is there a way around this?

Thanks

Does it also happen when you double the size of the input, e.g., by concatenating two copies of your input file? Or is the output then 140 MB?

In an attempt to reproduce your issue, I have just piped all .conllu files from the entire UD 2.2 release to the script; got a .txt file of 106 MB, faced no issues. There should be no internal limitation in the script that would prevent it from processing any amount of data, provided you have enough disk space to store the output.

foxik commented

My personal guess is 32bit issue -- isn't the Windows version by any change 32bit only? Then some similarly small limit would be expected.