Supporting non-UTF-8 paths
Byron opened this issue · 8 comments
I love argh
for its simplicity, compile performance and leanness, and I find myself migrating more and more projects to it.
Just now it became evident that despite supporting PathBuf
in struct fields, it will always assume valid UTF-8 in the arguments it parses. This might be an assumption that holds in Fuchsia, but may make it impossible to use for general purpose linux tools that consume paths in arguments.
I wonder if there is any chance to allow parsing PathBuf
fields without making assumptions about their encoding?
Thank you
PS: This question was triggered by this post on reddit/r/rust.
Perhaps os_str_bytes
could be of use.
Thank you for the kind words!
Like Freaky said, you can likely use std::ffi::OsString
instead of a PathBuf
. Is that sufficient for your use case?
Fuchsia does indeed have everything as UTF-8, but I'm 100% willing to add more general support.
That sounds like a bug. PathBuf
is implemented as a newtype around OsString
, so why does using it detour through String
?
argh
is fundamentally &str
-driven:
Line 188 in e59f2c8
I mentioned os_str_bytes
because it looks useful for argument parsing, though I suspect the simplified approach argh
takes makes it less important.
Thanks so much, @benbrittain, for your openness towards this improvement! With that added correctness, I think this will put argh
right into the sweet spot between low-enough compile times, convenience, and low/no binary overhead.
Right now I am actually implementing a tool supporting both structopt
and argh
CLIs, and it's clear that the declaration of argh
is easier to remember and less cluttered, so that opinionatedness does hit a sweet spot for me indeed.
My thoughts are as follows:
- work with
OsStr|OsString
instances internally, and usestd::env::args_os()
instead ofstd::env::args()
- when placing an
OsStr
into into a field that requires UTF-8 encoding, try to convert or fail gracefully - when wanting to call
FromArgValue
orfrom_str_fn(…)
, try to convert or fail gracefully - introduce a new
FromArgValueOS
orfrom_os_str_fn(…)
(working titles :D) for those who want to support non-UTF-8 fully
Is there anything fundamental I am missing?
And how much effort do you think it is?
I am asking because maybe we sketch out the implementation a little and attract contributors :).
Hi @benbrittain, I think if I would hear your ideas about my initial thoughts stated above, I would consider giving the implementation a shot. That way I hope to avoid going off in the wrong direction.
Thanks a lot!
That sounds great, thank you! Regarding binary overhead, I would think that there is close to none, but then again I don't really know how Argh
is working. As long as the code generator won't have to support two versions of everything, i.e. OsStr
+ str
, I am optimistic supporting non-utf8 won't be detrimental.
Let's see how it goes :).