dirname is not equivalent to the utility
calimeroteknik opened this issue · 16 comments
Here are a few counter-examples:
~ $ dirname_pbb() {
# Usage: dirname "path"
printf '%s\n' "${1%/*}/"
}
~ $ dirname dir1/dir2/
dir1
~ $ dirname_pbb dir1/dir2/
dir1/dir2
~ $ dirname dir1/dir2
dir1
~ $ dirname_pbb dir1/dir2
dir1/
~ $ dirname dir1
.
~ $ dirname_pbb dir1
dir1/
In fact I already tried to reimplement that one with variable substitution and found a few corner cases like this, making me unsure I was up to the task without a serious read of… the POSIX spec, I suppose.
Great catch.
This passes your above tests:
dirname() {
# Usage: dirname "path"
dir=${1%%/}
[[ $dir == */* ]] || dir=.
printf '%s\n' "${dir%/*}"
}
I'll keep digging to see if there are any other cases in which it fails. 👍
I can't reproduce an output in where they differ now. 👍
The fix for the corner cases I mentioned introduced a regression:
~ $ dirname_pbb() {
# Usage: dirname "path"
dir=${1%%/}
[[ $dir == */* ]] || dir=.
printf '%s\n' "${dir%/*}"
}
~ $ dirname_pbb /
.
~ $ dirname /
/
Working on a fix. Great catch.
Fixed.
dirname() {
# Usage: dirname "path"
dir=${1%%/}
[[ $dir ]] || dir=//
[[ $dir == */* ]] || dir=.
printf '%s\n' "${dir%/*}"
}
Tests find errors, but unfortunately do not prove correctness:
~ $ dirname_pbb() {
# Usage: dirname "path"
dir=${1%%/}
[[ $dir ]] || dir=//
[[ $dir == */* ]] || dir=.
printf '%s\n' "${dir%/*}"
}
~ $ dirname_pbb /foo
~ $ dirname /foo
/
I believe that this would warrant formal verification rather than testing, because it seems tricky.
Try this one:
dirname() {
# Usage: dirname "path"
dir=${1%%/}
dir=${dir%/*}
[[ $1 == */* ]] || dir=.
printf '%s\n' "${dir:-/}"
}
It passes my ever growing list of testcases. Either way, it's a lot more correct than what was previously in the book. :)
For something/
it just strips the ending slash rather than returning dot.
Here we go again:
dirname() {
# Usage: dirname "path"
dir=${1%%/}
[[ "${dir##*/*}" ]] && dir=.
dir=${dir%/*}
printf '%s\n' "${dir:-/}"
}
To still find problems we now need to go into non-normalised paths, for example, something//
I'm not even sure that the original dirname(1)
is actually POSIX-conformant by all accounts anyway, so this might end up being a bug-for-bug reimplementation of the GNU coreutils version: I assume that the standard would be a better reference past this point.
This being said, dirname(1)
from the GNU coreutils acts as if it normalised all but the prefix slashes (edit: unless the prefix slashes are the trailing slashes); I assume they considered it safer to keep them as-is, seeing as on some systems //something
is not the same thing as /something
.
I'm testing against busybox
dirname
as I don't use glibc
/GNU coreutils
. I'm not sure if busybox
dirname
is more POSIX compliant or not.
Yes, as noted in my previous comment non-normalised paths are still problematic… but at this point, isn't it safer to just read POSIX rather than retro-engineer busybox through test cases?
The standard describes an algorithm:
https://pubs.opengroup.org/onlinepubs/009695399/utilities/dirname.html
Here's an implementation of the algorithm described in the standard:
dirname() {
# Usage: dirname "path"
local tmp=${1:-.}
[[ $tmp != *[!/]* ]] && {
printf '/\n'
return
}
tmp=${tmp%%"${tmp##*[!/]}"}
[[ $tmp != */* ]] && {
printf '.\n'
return
}
tmp=${tmp%/*}
tmp=${tmp%%"${tmp##*[!/]}"}
printf '%s\n' "${tmp:-/}"
}
Here's another implementation which also passes all of my tests. It's also effectively the same algorithm as above.
dirname() {
# Usage: dirname "path"
local tmp=${1:-.}
tmp=${tmp%%"${tmp##*[!/]}"}
[[ ${tmp##*/*} ]] && tmp=.
tmp=${tmp%/*}
tmp=${tmp%%"${tmp##*[!/]}"}
printf '%s\n' "${tmp:-/}"
}
The direct implementation looks standard-conformant to me indeed. (beyond reasonable doubt although without machine-assisted checking)
I did not verify the second implementation; unlike basename
the problem is a little bit too large for me to mentally check that it the second implementation is equivalent to the first. We must recognize our limits as programmers: validating this one without machine assistance is most likely out of my league…
Playing it heuristic-style, if this codegolf passed all tests first time, it's not proof but it's an encouraging sign.
Historical anecdote: hidden files in UNIX were "invented" by accident due to a subtle bug in the, I'd say "codegolf" implementation of ls
, where it was supposed to ignore the special directories .
and ..
but in fact ignored all filenames starting in a dot. Oops… But this was (ab?)used and became the standard.
I'll push the direct implementation to the bible. 👍
Regardless of whether or not it is 100% compatible with the standard, this version is a lot "more" compatible with it. At some point I'll set up some proper tests so I can "safely" say "POSIX conformant".
Historical anecdote: hidden files in UNIX were "invented" by accident due to a subtle bug in the, I'd say "codegolf" implementation of ls, where it was supposed to ignore the special directories . and .. but in fact ignored all filenames starting in a dot. Oops… But this was (ab?)used and became the standard.
I knew this! Very interesting fact.
Thank you for all the help thus far!