Limitation on the size of the text file for the conllu to text script
Closed this issue · 2 comments
I tried to use the conllu_to_text.pl on a sizeable (1gb) conllu file, however it seems that the script has some limitations.
By doing perl conllu_to_text.pl <C:\Folder/Blabla.conllu> C:\Folder\blabla.txt
It seems the system is only able to create 70+ mb txt, I've trying other solutions on Windows. but it seems to be a limitation of the script.
Is there a way around this?
Thanks
Does it also happen when you double the size of the input, e.g., by concatenating two copies of your input file? Or is the output then 140 MB?
In an attempt to reproduce your issue, I have just piped all .conllu files from the entire UD 2.2 release to the script; got a .txt file of 106 MB, faced no issues. There should be no internal limitation in the script that would prevent it from processing any amount of data, provided you have enough disk space to store the output.
My personal guess is 32bit issue -- isn't the Windows version by any change 32bit only? Then some similarly small limit would be expected.