stevejenkins/postwhite

Sorting isn't perfect

stevejenkins opened this issue · 1 comments

Currently doing the final sort with simply sort -u "${tmp3}" > "${tmp4}"

sort -V works better on Linux systems, but the -V isn't available on OSX (and possibly other systems).

Using any -n options I've tried results in valid data being removed. See https://github.com/stevejenkins/postwhite/tree/master/testdata for examples and further discussion.

For now, I'm choosing a more complete whitelist over a prettier sort. But any suggestions for better sorting without losing data is appreciated.

Improved things somewhat by doing a sort first via:

sort -t. -k1,1n -k2,2n -k3,3n -k4,4n

Then performing a uniq against the output as a separate step.

That seems to get the sort close to perfect. The only remaining sort issue (which is minor) is that IPv6 addresses are split into two grips: those that start with letters and numbers (2a00, 2a01, etc.) appear at the top, and those that start with numbers only (2001, 2004, etc.) appear at the bottom. All the IPv4 addresses in the middle are sorted properly.

That seems as close as we're going to get without some serious sed or awk Kung Fu.