Feature Request: Add `--exclude-chars` option to urlenc command

Question

Feature Request: Add `--exclude-chars` option to urlenc command

sha5010 opened this issue 8 months ago · 8 comments

Feature Description

I would like to propose an enhancement for the urlenc subcommand. It would be great to have an -e or --exclude-chars option that allows users to specify characters that should not be URL encoded.

Use Case

Often, there's a need to URL encode a string but keep certain characters as they are, for example, when these characters are used as delimiters or have special meanings in specific contexts like URL. The proposed -e option would make it possible to exclude such characters from being encoded.

Proposed Behavior

When the -e option is used, any characters that match the ones specified in the argument would bypass the URL encoding process. For instance:

Command: urlenc 'hello world. / rust!' -e './'
Output: hello%20world.%20/%20rust%21

In this example, the dot . and the forward slash / is not encoded, as it's specified in the -e option.

Thank you for considering this feature request. Additionally, this tool is very useful with :! feature in Vim and command substitution in the shell. It is really great that it supports both stdin and command arguments!

Answer 1 · 2024-05-16T21:28:55.000Z

Good idea, I'll look into it, it seems feasible without too much hassle using the AsciiSet parameter

Answer 2 · 2024-05-17T10:52:49.000Z

Actually the AsciiSet parameter is kind of broken as it needs a &'static lifetime, so one can only work with const exclusion sets. I'll see how I can work around this.

Answer 3 · 2024-05-18T11:10:15.000Z

Done in master. I reimplemented url encoding, and changed the default charset to the one from RFC3986 (the one used for URLs).
I'll probably do a release soon.

Answer 4 · 2024-05-18T16:26:20.000Z

Thank you for the reimplementation!

--exclude-chars option seems to work just fine, but the change in encoding method raises some concerns for me. Some programming languages and online tools seem to encode all but unreserved characters.

Result of URL encoding of characters from 0x20 to 0x7e

urllib.parse.quote in Python

%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-./0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

urlencode in PHP

+%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E

URL Encode and Decode - Online

%20%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

As it is now, if we wanted to URL encode a " for example, there is no option to do so. How about trying to encode all except unreserved characters?

if !c.is_ascii_graphic() {               
    table[i as usize] = true;            
} else {                                 
    table[i as usize] = !(               
        c.is_ascii_alphanumeric() ||    
        matches!(c, '-' | '.' | '_' | '~') ||
        self.excluded.contains(c)        
    );                                   
}

Answer 5 · 2024-05-18T21:20:12.000Z

I'll probably set this as default and add an option to set it to the RFC.

Answer 6 · 2024-05-19T14:15:09.000Z

I pushed an update in master which restores the old default behaviour (non alphanumeric are encoded) + a -u option for URLs.
What do you think?

Answer 7 · 2024-05-20T01:14:53.000Z

That sounds like a great update.

Giving users the ability to specify characters they want to exclude from encoding offers a lot of flexibility, which is fantastic. Also, thank you for adding the -u option for URLs. It's very convenient.

Answer 8 · 2024-05-20T18:58:14.000Z

I've also added a -c or --custom option to specify a custom list of chars to encode