trou/rsbkb

Feature Request: Add `--exclude-chars` option to urlenc command

sha5010 opened this issue · 8 comments

Feature Description

I would like to propose an enhancement for the urlenc subcommand. It would be great to have an -e or --exclude-chars option that allows users to specify characters that should not be URL encoded.

Use Case

Often, there's a need to URL encode a string but keep certain characters as they are, for example, when these characters are used as delimiters or have special meanings in specific contexts like URL. The proposed -e option would make it possible to exclude such characters from being encoded.

Proposed Behavior

When the -e option is used, any characters that match the ones specified in the argument would bypass the URL encoding process. For instance:

Command: urlenc 'hello world. / rust!' -e './'
Output: hello%20world.%20/%20rust%21

In this example, the dot . and the forward slash / is not encoded, as it's specified in the -e option.

Thank you for considering this feature request. Additionally, this tool is very useful with :! feature in Vim and command substitution in the shell. It is really great that it supports both stdin and command arguments!

trou commented

Good idea, I'll look into it, it seems feasible without too much hassle using the AsciiSet parameter

trou commented

Actually the AsciiSet parameter is kind of broken as it needs a &'static lifetime, so one can only work with const exclusion sets. I'll see how I can work around this.

trou commented

Done in master. I reimplemented url encoding, and changed the default charset to the one from RFC3986 (the one used for URLs).
I'll probably do a release soon.

Thank you for the reimplementation!

--exclude-chars option seems to work just fine, but the change in encoding method raises some concerns for me. Some programming languages and online tools seem to encode all but unreserved characters.

Result of URL encoding of characters from 0x20 to 0x7e

urllib.parse.quote in Python

%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-./0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

urlencode in PHP

+%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E

URL Encode and Decode - Online

%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

As it is now, if we wanted to URL encode a " for example, there is no option to do so. How about trying to encode all except unreserved characters?

if !c.is_ascii_graphic() {               
    table[i as usize] = true;            
} else {                                 
    table[i as usize] = !(               
        c.is_ascii_alphanumeric() ||    
        matches!(c, '-' | '.' | '_' | '~') ||
        self.excluded.contains(c)        
    );                                   
}
trou commented

I'll probably set this as default and add an option to set it to the RFC.

trou commented

I pushed an update in master which restores the old default behaviour (non alphanumeric are encoded) + a -u option for URLs.
What do you think?

That sounds like a great update.

Giving users the ability to specify characters they want to exclude from encoding offers a lot of flexibility, which is fantastic. Also, thank you for adding the -u option for URLs. It's very convenient.

trou commented

I've also added a -c or --custom option to specify a custom list of chars to encode