Not issue - but a large file performance stat
Closed this issue · 4 comments
I have compared
nawk with frawk on a macOS Darwin: i7 with 16GB RAM.
I used a 17GB .csv file for test using:
frawk 'BEGIN{FS=","};{count[NF]++};END{for(i in count){print "With " i " fields - count is: " count[i]}}' aggregated.csv
results:
frawk - 2 min
nawk - 23 min
Impressive. Thank for the work on frawk.
I have compared nawk with frawk on a macOS Darwin: i7 with 16GB RAM.
I used a 17GB .csv file for test using: frawk 'BEGIN{FS=","};{count[NF]++};END{for(i in count){print "With " i " fields - count is: " count[i]}}' aggregated.csv
results: frawk - 2 min nawk - 23 min
Impressive. Thank for the work on frawk.
Could you also give gawk a try? How does it behave?
gawk - just over 4 mins.
If you compare with gawk
(https://www.gnu.org/software/gawk/), test with recent versions (e.g.: 5.2.1). gawk 4.x.x for example is quite a bit slower.
gawk -b
or gawk --characters-as-bytes
will also speedup gawk
quite a bit, as it does not care about locale encoding.
'mawk` (https://invisible-island.net/mawk/) is in general also quite fast.
frawk
still will win most likely.
In case you want to see different awk programs in actions (before some frawk performance bugs were fixed):
#37
Sorry I missed this when you initially posted it (I do try and keep track of issues here, but August was a busy time). Thanks for the kind words (and thanks for the others who've weighed in!)