Originate/scriptkeeper

ability to test commands with generated/variable arguments

Closed this issue · 8 comments

i have a script that writes to a logfile and it includes the date, so right now i'm not able to write a protocol that passes. here is an example of the error i get:

error:
  expected: /bin/sh -c "echo \"2019-03-06T22:15:12+00:00 - start: /directory - filename - test\" >> /Users/testuser/tmp/unrar.log"
  received: /bin/sh -c "echo \"2019-03-06T22:16:05+00:00 - start: /directory - filename - test\" >> /Users/testuser/tmp/unrar.log"

i imagine the ability to place a regex in my command is a straightforward way to solve this, any other ideas?

@soenkehahn some open questions before this can be tackled:

  • how should users indicate a regular expression in their protocol? what characters should signify to our parser/tokenizer that the following string should be taken as a regex? since $(...) and `...` will usually be interpreted by bash before arguments are passed to the syscall, maybe those are viable options?
  • i think we'll have to rethink how commands are parsed and matched, since argument for argument equality would no longer work. for example:
    - command: echo $(\d\s*\d)
    should match echo 1 2 and echo 12, but currently the representation of those two commands coming from the script are not structured well to test the match. i'm curious if people have thoughts on how to adjust the way arguments are stored and matched.
  • can a regex be adjacent to text or should it be surrounded by whitespace/its own token? for example:
    - command: echo a$(\d)
    should this be allowed, and match echo a1? or should users have to do
    - command: echo $(a\d)

@soenkehahn bump for opinion on the above questions

First I thought we could mark regexes by putting them into slashes, like in javascript. But then I realized that there's actually a good chance that normal commands will start and end with a slash, for example /usr/local/bin/ruby ./showDirectory.rb ./myDir/. So what about this:

protocols:
  - protocol:
      - regex: /bin/cp .*.txt .*.md

Re: how the arguments are matched: Could we match the regex against our canonical rendering of the whole command that is executed? (That would be Command::format, i think.) E.g. like this:

#!/usr/bin/env bash
/bin/cp foo.txt bar.md
protocols:
  - protocol:
      - regex: \/bin\/cp .*.txt .*.md

So then - regex: echo $(\d\s*\d) would match both echo 1 2 and echo 12. @matthandlersux: What do you think about that?

#124, which is about to be merged, is also somewhat relevant, since it changes the canonical representation from /bin/cp foo bar to cp foo bar. CC: @hallettj

sounds good... i spiked this out by initially trying to embed regexes into the command (using backticks as a delimiter), and while that isn't too complicated, it does not allow matching on different variants of spacing. i like your suggestion of just having the whole line be a regex... makes the parsing simpler and avoids complicated escaping logic.

new questions:

  • i would assume that with a format like you suggest, you could only have either a command or regex but not both, does that make sense?
  • should the regex assume that it is anchored on either end? ie. is - regex: /bin/cp .*.txt .*.md interpreted as - regex: ^/bin/cp .*.txt .*.md$?

i would assume that with a format like you suggest, you could only have either a command or regex but not both, does that make sense?

Yes, exactly. So a Step is either a yaml string, or an object, and that object has to have either command or regex, but not both.

should the regex assume that it is anchored on either end? ie. is - regex: /bin/cp ..txt ..md interpreted as - regex: ^/bin/cp ..txt ..md$?

Good question and I'm not sure. Maybe not, since that's what other tools do, e.g. grep. Maybe @astampoulis has an opinion?

I'm not sure! I would probably go for avoiding the anchoring-by-default, since that's more general, but I'm not sure if I can find a good use case.

One not very convincing case I came up with is -- let's say I want to say that some git command will run but don't care about which one, we could just do regex: ^/usr/bin/git, but something similar is doable in the anchored case too.

my vote would be yes for automatic anchoring, only because it forces people to be explicit. ie, they need to put in .* if they actually want that. without anchoring, regex: /usr/bin/git matches something like chmod 700 /usr/bin/git, which would be confusing to debug. if you did want to match that, with anchoring, you'd have to specify .*/usr/bin/git before the test would pass, and the explanation of the failure would make that clear.

@matthandlersux: Sounds good to me.