Pipe from Stdin instead of fetching the bytes
clipperhouse opened this issue · 2 comments
Currently, the jargon
command line takes its input by specifying the source via flags.
-f string
A file path to lemmatize
-s string
A (quoted) string to lemmatize
-u string
A URL to fetch and lemmatize
It occurs to me that jargon would play better simply by accepting Stdin.
There are already fine tools for reading files (cat
) and fetching URLs (curl
). jargon
should just accept bytes piped from other tools.
Files
cat file.txt | jargon
replaces
jargon -f file.txt
URLs
curl https://example.com | jargon
replaces
jargon -u https://example.com
Strings
echo "I luv Rails" | jargon
replaces
jargon -s "I luv Rails"
@kevin-montrose suggested leaving both options open: support Stdin but also keep the flags. The theory is that the shell piping might be a perf hit vs a direct file read by jargon itself.
On my machine, I did it both ways, with a 22MB file:
time jargon -f ~/Downloads/cities1000.txt > /dev/null
real 0m2.470s
user 0m2.478s
sys 0m0.038s
time jargon -f ~/Downloads/cities1000.txt > /dev/null
real 0m2.460s
user 0m2.476s
sys 0m0.034s
time cat ~/Downloads/cities1000.txt | jargon > /dev/null
real 0m2.443s
user 0m2.466s
sys 0m0.049s
time cat ~/Downloads/cities1000.txt | jargon > /dev/null
real 0m2.450s
user 0m2.473s
sys 0m0.049s
I don’t see a significant difference tho of course this is just my machine, and not super rigorous.
New branch that allows both Stdin and flags: https://github.com/clipperhouse/jargon/compare/stdin-flags