shomali11/commander

Parsing of text with non-breaking space has incorrect results.

kostrahb opened this issue · 4 comments

I have written a slack bot using slacker, however, during testing we have encountered strange issues with parsing texts with URLs. After a little bit of tinkering, I pinpointed the issue - slack automatically replaces normal space right before URL with a non-breaking one in a command text. I raised a question to slack support and they replied that slack expects the application is able to handle UTF-8 characters. Therefore I would like to ask if it would be possible to replace them either in this library just before command parsing or in slacker before sending the text to this library?

An example which will result in incorrect parsing:

package main

import (
	"fmt"
	"github.com/shomali11/commander"
)

func main() {
	properties, isMatch := commander.NewCommand("set <component> <environment> <xpath> <value>").Match("set be approval xpath-expression\u00A0https://some-url/")
	fmt.Println(isMatch)
	fmt.Println(properties.StringParam("xpath", ""))
}

Good question. I think that this is something that slacker should be handling but I am open to learning your thoughts.

My only concern would be when does UTF-8 characters need to be handled and when should they not be. For example, when the user enters a UTF-8 character in the message.

I am not sure how widespread this library is and how much trouble it would raise. However, IMHO this is a backward compatible change that might even be beneficial to this library as you expand its possible uses to the UTF-8 world. In any case, the decision is up to you I guess, but this stack overflow answer might help you getting the whitespace characters: https://stackoverflow.com/a/46637343/1869278

I'm just now noticing this issue come up when you copy text containing an HTML link from a Slack channel and paste it back into Slack as a message to the bot. If there is a hostname in the copied text, Slack applies HTML logic to the copied text. According to the "debug" output, this results in normal spaces being converted to "non-breaking spaces", e.g.:

remote\u00a0<http:\/\/www.domain.com|www.domain.com>

... instead of what comes up when you manually type it...

remote <http:\/\/www.domain.com|www.domain.com>

I'd be happy with the ability to modify/filter the incoming text before it is parsed to manually rip out and replace the spaces.