Path globbing syntax is not documented
PerMildner opened this issue · 2 comments
Looking at README.md
and s5cmd --help
, I see no details about the glob syntax.
In particular, it seems s5cmd
understands the "double star" **
syntax for matching any folder depth, but this is not mentioned in the README.md
examples.
Even a single star matches any folder depth. The asterisk is not bound by path separators:
$ s5cmd ls "s3://foo/*.txt"
2024/04/13 09:38:00 3 bar/baz.txt
The reason why **
works is because any occurrence of *
is replaced by a .*
regular expression (as you can see, it also supports ?
to match single characters):
Lines 63 to 68 in c1c7ee3
And s5cmd gives the S3 API an empty delimiter, instead of /
, when the URL in question contains a "*
" or "?
":
Lines 264 to 270 in c1c7ee3
This could be enhanced so that u.Delimiter
is set to /
for the else
branch, as well, unless the URL contains **
, but I think that'd be crude and incomplete - you might have URLs with several combinations of *
and **
wildcards, so it probably needs some more logic in other places.
Thanks for looking at this.
I think the thing I did not see in the documentation was something that explicitly and clearly says "Even a single star matches any folder depth". Perhaps this is what Usage means by "s5cmd supports multiple-level wildcards for all S3 operations" but it is not clear.
Personally I prefer clear specification-style descriptions in --help and README.md before showing the examples, rather than just relying on the user to guess meaning from examples, but I am sure not everyone would agree.