mccormicka/string-argv

Escaped space not handled

Closed this issue · 8 comments

STEPS TO REPRODUCE
$ node
> var stringArgv = require('string-argv');
undefined
> stringArgv('one\ one two')
EXPECTED RESULTS

[ 'one one', 'two' ]

ACTUAL RESULTS

[ 'one', 'one', 'two' ]

I'm not sure how to achieve this since 'one\ one two' === 'one one two'

I can think of two ways off the top of my head.

  1. Giving '\ ' special treatment in your regex. i.e. a space is a space UNLESS it's preceded by a backslash.

  2. A pre-processing step that places quotes around strings separated with '\ '.

In javascript there is a difference between one\ one and one\\ one the first one outputs one one the second outputs one\ one.
\ is a single character in a string, not 2.
It is not possible to differentiate the character ' ' and the character '\ ' in a regex.
If I do "\ ".indexOf("\\") it returns -1, therefore, I cannot search for backslash as it is not present in the string.
If what you're asking is to support stringArgv('one\\ one two') I'm afraid it falls out of scope for this project. This would be a specific use case.

Might I ask what is the exact use case for this request so I can better understand where it comes from.

I've written a library that uses child_process.spawn which requires command line arguments to be presented as a list. This library also uses ssh2.exec which requires a command line string.

I'd like the interface to the my functions that use these library routines to match and I've settled on a command line string as opposed to a list because it's simpler. This means I need to convert the command line string to a list before I call child_process.spawn. This is what I'm using string-argv to accomplish.

This isn't a deal-breaker for me. Your library works well as is (thanks!). My library simply doesn't support '\ '. I filed this ticket because for completeness sake, I felt thatstring-argv might want to support this.

https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options

There is a String.raw() function in JS and in Node. Could this help?

let str = String.raw`one\ one`
let str2 = 'one\ one'

console.log(str)   // one\ one
console.log(str2)   // one one

@juergba unfortunately String.raw doesn't support \ character, it simply didn't escape the \.
So in the this case it would be equivalent of doing const str = "one\\ one"

Unfortunately, the backslash space is exactly the same as a regular space in the string.

" ".charCodeAt(0) // 32
"\ ".charCodeAt(0) // 32

So most likely the orginal demand wanted to support "\\ " which is actually 2 characters.
The only thing I can see to support this would be to change the regex to
([^\s'"]([^\s'"]*(['"])([^\3]*?)\3)+[^\s'"]*)|([^\s'"]+(\\ )[^\s'"]+)+|[^\s'"]+|(['"])([^\5]*?)\5 (https://regexr.com/4oht2)
But I wouldn't add explicit support for these 2 characters in the library by default as it would be a breaking change.
I can try to see if I can add options for people to opt-in

Actually, my solution doesn't work because it doesn't match one\ two\ three (strings with multiple \ ).
I would need a negative set [^\s'"(\\ )] but that's not a thing.
Or find a way to detect word\ word with respect to "quoted words" "quoted nested with\ and 'nested". All these edge cases make this much more complicated

The only alternative I can thing of is to implement a state machine with language parsing with a lookahead of 2 for token matching.
I don't see the value in implementing that whole thing in the library.

I will close the issue for now because after a year of thinking about this, I can't see a way to easily support it without either breaking compatibility or changing the implementation significantly.
If someone can come up with a reasonable way to implement it using regex, I'll reconsider it