ezrosent/frawk

Not issue - but a large file performance stat

Closed this issue · 4 comments

I have compared
nawk with frawk on a macOS Darwin: i7 with 16GB RAM.

I used a 17GB .csv file for test using:
frawk 'BEGIN{FS=","};{count[NF]++};END{for(i in count){print "With " i " fields - count is: " count[i]}}' aggregated.csv

results:
frawk - 2 min
nawk - 23 min

Impressive. Thank for the work on frawk.

I have compared nawk with frawk on a macOS Darwin: i7 with 16GB RAM.

I used a 17GB .csv file for test using: frawk 'BEGIN{FS=","};{count[NF]++};END{for(i in count){print "With " i " fields - count is: " count[i]}}' aggregated.csv

results: frawk - 2 min nawk - 23 min

Impressive. Thank for the work on frawk.

Could you also give gawk a try? How does it behave?

gawk - just over 4 mins.

ghuls commented

If you compare with gawk (https://www.gnu.org/software/gawk/), test with recent versions (e.g.: 5.2.1). gawk 4.x.x for example is quite a bit slower.

gawk -b or gawk --characters-as-bytes will also speedup gawk quite a bit, as it does not care about locale encoding.

'mawk` (https://invisible-island.net/mawk/) is in general also quite fast.

frawk still will win most likely.

In case you want to see different awk programs in actions (before some frawk performance bugs were fixed):
#37

Sorry I missed this when you initially posted it (I do try and keep track of issues here, but August was a busy time). Thanks for the kind words (and thanks for the others who've weighed in!)