parsing of strings containing flag notation

Question

parsing of strings containing flag notation

derhuerst opened this issue 4 years ago · 6 comments

#!/usr/bin/env node
const mri = require('mri')
console.log(process.argv)
console.log(mri(process.argv.slice(2)))

./args.js -a -b '-c foo'

[
  '/usr/local/Cellar/node/16.0.0_1/bin/node',
  '/Users/j/web/pev2-cli/foo.js',
  '-a',
  '-b',
  '-c foo'
]
{
  _: [],
  a: true,
  b: true,
  c: true,
  ' ': true,
  f: true,
  o: [ true, true ]
}

Is that intended?

I would have expected this:

{
  _: ['-c foo'],
  a: true,
  b: true
}

Or maybe this (but I find the result above much more intuitive):

{
  _: [],
  a: true,
  b: true,
  c: 'foo'
}

Answer 1 · 2021-05-25T22:59:46.000Z

Hey, sorry haven't had a chance to look yet.

I think you'd have to do this: ./args.js -a -b -- '-c foo' or maybe ./args.js -a -b -- -c foo

Anything after a -- is always thrown into the _ key space.

Answer 2 · 2021-05-26T10:11:08.000Z

I think you'd have to do this: ./args.js -a -b -- '-c foo' or maybe ./args.js -a -b -- -c foo

Anything after a -- is always thrown into the _ key space.

That doesn't solve my problem unfortunately, because I would like to treat -b like any other option/flag: I want to be able to put other flags behind. Using -- defeats the point of using a flag and leveraging mri's parsing logic.

Answer 3 · 2021-12-31T00:30:30.000Z

I would also like to add a simpler case, in which mri parses incorrectly:

./args.js -

The hyphen (-) is very common to indicate stdin or stdout, but when parsed with mri, the result is:

{ _: [] }

Whereas, - should be part of _ key.

Answer 4 · 2022-01-24T11:48:47.000Z

I'm not sure that there's a way to interpret that string the way you want. As you can see from the process.argv the quote marks have already been interpreted and removed from the command. The quote marks are the only real way to determine if the argument should be interpreted differently.

./args.js -a -b '-c foo'
[
  '-a',
  '-b',
  '-c foo'
]

MRI just sees a string that starts with a dash, which is how it identifies an option. It could peek further into the string to check for illegal characters but I don't think that will resolve all possibilities. Such as:

./args.js -a -b '-abc'

You can still pass a string like that, but it requires you to use the foo=bar syntax instead. Depending on your use case that might be sufficient.

./args.js -a -b='-c foo'

Alternatively you could pass your arguments with a different syntax (or prefix) and transform them before passing them on.

./args.js -a -b 'c=foo&d=bar'
./args.js -a -b '$ -c foo -d bar'

Which requires a little bit more thought, but it perhaps less confusing to an end user than requiring that one option MUST use an equals sign.

Answer 5 · 2022-01-24T16:46:57.000Z

Maybe I'm missing something here, but there should be a way to support even the use case that I have proposed: Implement parsing with state, as in "Do I expect a potential argument now?".

Am I wrong?

Answer 6 · 2022-01-25T11:19:37.000Z

Okay well let's talk through the parsing state quickly to explain how it makes the decision.

/example -b '-d foo'
# argv = ['node', '/example', '-b', '-d foo']

We ditch the first 2 args because they aren't useful to us, then we pass the rest to MRI. Internally the state flow goes like this.

arv[0] starts with - therefore we parse the first arg as a flag list.
Remove the - and split argv[0] to get our flag list [ 'b' ].
Inspect our next argument argv[1] to decide if it's a value for our flag, or another option.
It exists ( good ) but starts with a - and is determined to be the start of an option, not a flag.
Option b is stored as the default value true.
Return to the start of the loop.
argv[1] starts with - therefore we parse the second arg as a flag list.
Remove the - and split argv[1] to get our flag list [ 'd', ' ', 'f', ' o', 'o' ].
Inspect our next argument argv[2] to decide if it's a value for our flag, or another option.
It doesn't exist so we use the default value for the flag.
Options [ 'd', ' ', 'f', ' o', 'o' ] are stored as the default value true.
o is seen twice, so it's value changes from true to [true, true].
We have no more arguments so the loop ends.

The troublesome step for you is 4, as it determines that what you want to be the value of the option isn't a valid value. It does this by checking for the existence of a - as the first character. The shell removed the quote marks before we received it so we cannot tell this was a quoted argument. Having the space implies that it might have been quoted, but a subtly different input (-b "-abc") would fail this check.

I mention that using -b='-abc' would work differently, this is because that is passed as a single argument by the shell and MRI splits it into flag and value by the = character, meaning it doesn't need to try and guess if the next argument is a value.

A possible option would be to tell MRI that this option is always a dumb string field and it should parse it as such.

mri(['-b', '-abc'], { string: ['b'] })

But this doesn't match up with how MRI actually reads and interprets the values at the moment, they are read as a piece of text and then interpreted afterwards and hence don't affect how the command itself it actually parsed. Adding this effect causes strange side effects if a user makes a mistake. Like in the below example.

example -b '-d val'
# argv [ '-b', '-d val' ]
# old { b: true, d: true, v: true, a: true, l: true }
# proposed { b: '-d value' }

example -b -d val
# argv ['-b', '-d', 'val']
# old { b: true,  d: 'val' }
# proposed { b: '-d', _: [ 'val' ] }

Note that the second doesn't include quote marks, but the modification has still changed how the command is interpreted.