Feature Request: Add `--exclude-chars` option to urlenc command
sha5010 opened this issue · 8 comments
Feature Description
I would like to propose an enhancement for the urlenc
subcommand. It would be great to have an -e
or --exclude-chars
option that allows users to specify characters that should not be URL encoded.
Use Case
Often, there's a need to URL encode a string but keep certain characters as they are, for example, when these characters are used as delimiters or have special meanings in specific contexts like URL. The proposed -e
option would make it possible to exclude such characters from being encoded.
Proposed Behavior
When the -e
option is used, any characters that match the ones specified in the argument would bypass the URL encoding process. For instance:
Command: urlenc 'hello world. / rust!' -e './'
Output: hello%20world.%20/%20rust%21
In this example, the dot .
and the forward slash /
is not encoded, as it's specified in the -e
option.
Thank you for considering this feature request. Additionally, this tool is very useful with :!
feature in Vim and command substitution in the shell. It is really great that it supports both stdin and command arguments!
Good idea, I'll look into it, it seems feasible without too much hassle using the AsciiSet
parameter
Actually the AsciiSet
parameter is kind of broken as it needs a &'static
lifetime, so one can only work with const
exclusion sets. I'll see how I can work around this.
Done in master
. I reimplemented url encoding, and changed the default charset to the one from RFC3986 (the one used for URLs).
I'll probably do a release soon.
Thank you for the reimplementation!
--exclude-chars
option seems to work just fine, but the change in encoding method raises some concerns for me. Some programming languages and online tools seem to encode all but unreserved characters.
Result of URL encoding of characters from 0x20 to 0x7e
urllib.parse.quote in Python
%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-./0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~
urlencode in PHP
+%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
URL Encode and Decode - Online
%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~
As it is now, if we wanted to URL encode a "
for example, there is no option to do so. How about trying to encode all except unreserved characters?
if !c.is_ascii_graphic() {
table[i as usize] = true;
} else {
table[i as usize] = !(
c.is_ascii_alphanumeric() ||
matches!(c, '-' | '.' | '_' | '~') ||
self.excluded.contains(c)
);
}
I'll probably set this as default and add an option to set it to the RFC.
I pushed an update in master
which restores the old default behaviour (non alphanumeric are encoded) + a -u
option for URLs.
What do you think?
That sounds like a great update.
Giving users the ability to specify characters they want to exclude from encoding offers a lot of flexibility, which is fantastic. Also, thank you for adding the -u
option for URLs. It's very convenient.
I've also added a -c
or --custom
option to specify a custom list of chars to encode