dylanaraps/pure-bash-bible

dirname is not equivalent to the utility

calimeroteknik opened this issue · 16 comments

Here are a few counter-examples:

~ $ dirname_pbb() {
    # Usage: dirname "path"
    printf '%s\n' "${1%/*}/"
}

~ $ dirname dir1/dir2/
dir1
~ $ dirname_pbb dir1/dir2/
dir1/dir2

~ $ dirname dir1/dir2
dir1
~ $ dirname_pbb dir1/dir2
dir1/

~ $ dirname dir1
.
~ $ dirname_pbb dir1
dir1/

In fact I already tried to reimplement that one with variable substitution and found a few corner cases like this, making me unsure I was up to the task without a serious read of… the POSIX spec, I suppose.

Great catch.

This passes your above tests:

dirname() {
    # Usage: dirname "path"
    dir=${1%%/}

    [[ $dir == */* ]] || dir=.

    printf '%s\n' "${dir%/*}"
}

I'll keep digging to see if there are any other cases in which it fails. 👍

I can't reproduce an output in where they differ now. 👍

The fix for the corner cases I mentioned introduced a regression:

~ $ dirname_pbb() {
    # Usage: dirname "path"
    dir=${1%%/}

    [[ $dir == */* ]] || dir=.

    printf '%s\n' "${dir%/*}"
}
~ $ dirname_pbb /
.
~ $ dirname /
/

Working on a fix. Great catch.

Fixed.

dirname() {
    # Usage: dirname "path"
    dir=${1%%/}

    [[ $dir ]] || dir=//
    [[ $dir == */* ]] || dir=.

    printf '%s\n' "${dir%/*}"
}

Tests find errors, but unfortunately do not prove correctness:

~ $ dirname_pbb() {
    # Usage: dirname "path"
    dir=${1%%/}

    [[ $dir ]] || dir=//
    [[ $dir == */* ]] || dir=.

    printf '%s\n' "${dir%/*}"
}
~ $ dirname_pbb /foo

~ $ dirname /foo
/

I believe that this would warrant formal verification rather than testing, because it seems tricky.

Try this one:

dirname() {
    # Usage: dirname "path"
    dir=${1%%/}
    dir=${dir%/*}

    [[ $1 == */* ]] || dir=.

    printf '%s\n' "${dir:-/}"
}

It passes my ever growing list of testcases. Either way, it's a lot more correct than what was previously in the book. :)

For something/ it just strips the ending slash rather than returning dot.

Here we go again:

dirname() {
    # Usage: dirname "path"
    dir=${1%%/}
    [[ "${dir##*/*}" ]] && dir=.
    dir=${dir%/*}

    printf '%s\n' "${dir:-/}"
}

To still find problems we now need to go into non-normalised paths, for example, something//

I'm not even sure that the original dirname(1) is actually POSIX-conformant by all accounts anyway, so this might end up being a bug-for-bug reimplementation of the GNU coreutils version: I assume that the standard would be a better reference past this point.

This being said, dirname(1) from the GNU coreutils acts as if it normalised all but the prefix slashes (edit: unless the prefix slashes are the trailing slashes); I assume they considered it safer to keep them as-is, seeing as on some systems //something is not the same thing as /something.

I'm testing against busybox dirname as I don't use glibc/GNU coreutils. I'm not sure if busybox dirname is more POSIX compliant or not.

Yes, as noted in my previous comment non-normalised paths are still problematic… but at this point, isn't it safer to just read POSIX rather than retro-engineer busybox through test cases?

The standard describes an algorithm:
https://pubs.opengroup.org/onlinepubs/009695399/utilities/dirname.html

Here's an implementation of the algorithm described in the standard:

dirname() {
    # Usage: dirname "path"
    local tmp=${1:-.}

    [[ $tmp != *[!/]* ]] && {
        printf '/\n'
        return
    }

    tmp=${tmp%%"${tmp##*[!/]}"}

    [[ $tmp != */* ]] && {
        printf '.\n'
        return
    }

    tmp=${tmp%/*}
    tmp=${tmp%%"${tmp##*[!/]}"}

    printf '%s\n' "${tmp:-/}"
}

Here's another implementation which also passes all of my tests. It's also effectively the same algorithm as above.

dirname() {
  # Usage: dirname "path"
  local tmp=${1:-.}

  tmp=${tmp%%"${tmp##*[!/]}"}

  [[ ${tmp##*/*} ]] && tmp=.

  tmp=${tmp%/*}
  tmp=${tmp%%"${tmp##*[!/]}"}

  printf '%s\n' "${tmp:-/}"
}

The direct implementation looks standard-conformant to me indeed. (beyond reasonable doubt although without machine-assisted checking)

I did not verify the second implementation; unlike basename the problem is a little bit too large for me to mentally check that it the second implementation is equivalent to the first. We must recognize our limits as programmers: validating this one without machine assistance is most likely out of my league…

Playing it heuristic-style, if this codegolf passed all tests first time, it's not proof but it's an encouraging sign.

Historical anecdote: hidden files in UNIX were "invented" by accident due to a subtle bug in the, I'd say "codegolf" implementation of ls, where it was supposed to ignore the special directories . and .. but in fact ignored all filenames starting in a dot. Oops… But this was (ab?)used and became the standard.

I'll push the direct implementation to the bible. 👍

Regardless of whether or not it is 100% compatible with the standard, this version is a lot "more" compatible with it. At some point I'll set up some proper tests so I can "safely" say "POSIX conformant".

Historical anecdote: hidden files in UNIX were "invented" by accident due to a subtle bug in the, I'd say "codegolf" implementation of ls, where it was supposed to ignore the special directories . and .. but in fact ignored all filenames starting in a dot. Oops… But this was (ab?)used and became the standard.

I knew this! Very interesting fact.

Thank you for all the help thus far!