diff <(tail -n+2 Human_HighQuality.txt) <(tail -n+2 Human_HighQuality.txt | sort)
thy@tde-ws:~/usr/matylde$ make input
tail -n+2 Human_HighQuality.txt | expand -t 1 > input.txt
May require sudo apt install make
Fast and memory inexpensive (no need to read the whole data set), but use imperative style. Works because the data file is sorted
thy@tde-ws:~/usr/matylde$ make awk
< input.txt time factor-prefix-simple.awk > dict-with-awk.txt
0.02user 0.00system 0:00.02elapsed 96%CPU (0avgtext+0avgdata 3900maxresident)k
0inputs+776outputs (0major+217minor)pagefaults 0swaps
May require sudo apt install gawk
The GNU Awk User’s Guide
Still quite fast, at least if the data set can be kept in memory
< input.txt time python factor-prefix-with-dict.py > dict-with-python.json
0.06user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 19432maxresident)k
0inputs+1000outputs (0major+3994minor)pagefaults 0swaps
< dict-with-python.json jq -r 'to_entries | map([.key] + .value | join(" "))[]' > dict-with-python.txt
Use json
to output python
dict (Introducing JSON)
Slower (less that 5 seconds), same algo than python but more succinct
-
Src
[inputs] | map(split(" ")) | reduce .[] as $i ({}; .[$i[0]] += [$i[1]])
thy@tde-ws:~/usr/matylde$ make jq
< input.txt time jq -nR -f factor-prefix-with-dict.jq > dict-with-jq.json
3.04user 0.00system 0:03.05elapsed 99%CPU (0avgtext+0avgdata 14564maxresident)k
0inputs+1288outputs (0major+3487minor)pagefaults 0swaps
< dict-with-jq.json jq -r 'to_entries | map([.key] + .value | join(" "))[]' > dict-with-jq.txt
May require sudo apt install jq
(jq is a lightweight and flexible command-line JSON processor.)
thy@tde-ws:~/usr/matylde$ make check
diff <(tail -n+2 Human_HighQuality.txt) <(tail -n+2 Human_HighQuality.txt | sort)
diff input.txt <(sort input.txt)
< dict-with-awk.txt jq -nR '[inputs] | map(split(" ") | { (.[0]): .[1:] }) | add' > dict-with-awk.json
diff dict-with-{awk,jq}.txt
diff dict-with-{jq,python}.txt
diff dict-with-{awk,jq}.json
jsondiff dict-with-{jq,python}.json | jq
{}
May require sudo apt install python3-jsondiff