Toy clone of coreutils wc in Go
gowc
is a toy reimplementation of wc in Go, mainly written for fun 😃. It's perfectly functional, well tested and correct but there's no real
benefit over using it vs the original (aside from maybe the JSON flag).
The main reason I chose to write it was that I discovered you can (sort of) abuse the io.Writer interface to count lines, words etc. The primary benefit being you can then leverage io.Copy from either files or stdin (both of which implement io.Reader).
Using io.Copy means large files automatically get chunked into 32kb blocks and streamed through your program so gowc
works seamlessly on enormous files!
So this was a fun experiment to see how far you can take it.
Compiled binaries for all supported platforms can be found in the GitHub release. There is also a homebrew tap:
brew install FollowTheProcess/homebrew-tap/gowc
gowc < moby_dick.txt
# Or
cat moby_dick.txt | gowc
File Bytes Chars Lines Words
moby_dick.txt 1232922 1232922 23243 214132
gowc moby_dick.txt
File Bytes Chars Lines Words
moby_dick.txt 1232922 1232922 23243 214132
Multiple files are counted concurrently using a worker pool 🚀
gowc myfiles/*
File Bytes Chars Lines Words
.myfiles/onemore.txt 460 460 2 63
.myfiles/another.txt 608 608 2 80
.myfiles/moby_dick.txt 1232922 1232922 23243 214132
gowc moby_dick.txt --json | jq
{
"name": "moby_dick.txt",
"lines": 23243,
"bytes": 1232922,
"words": 214132,
"chars": 1232922
}
You can also do multiple files in JSON:
gowc myfiles/* --json
[
{
"name": "myfiles/onemore.txt",
"lines": 2,
"bytes": 460,
"words": 63,
"chars": 460
},
{
"name": "myfiles/another.txt",
"lines": 2,
"bytes": 608,
"words": 80,
"chars": 608
},
{
"name": "myfiles/moby_dick.txt",
"lines": 23243,
"bytes": 1232922,
"words": 214132,
"chars": 1232922
}
]
I've not really put too much effort into optimisation, there's potentially some to be had, but it performs fast enough so that you wouldn't notice the difference with the original.
Counting on multiple files happens concurrently in a worker pool across all your cores so even on very high numbers of files it performs well:
That's 9261 files read and counted words, lines, bytes and utf-8 characters in just over 18ms 🚀
This package was created with copier and the FollowTheProcess/go_copier project template.