dloscutoff/pip

Set I/O formatting options in the code, not with flags

dloscutoff opened this issue · 3 comments

Context

WHEREAS, when Pip was first created, its command-line flags were intended to count as +1 byte each, according to the then-current scoring rules of Code Golf StackExchange;

WHEREAS, Code Golf StackExchange no longer adds flags to the bytecount;

WHEREAS, however, using free flags to obtain a different output format is considered unsporting by several CGCC members;

WHEREAS, after considering their arguments as regards other languages, I have come to agree that setting the I/O format ought to cost bytes;

BE IT RESOLVED, THAT all Pip flags affecting input and output behavior be removed, and syntax be added to perform the function of each such flag in 1 byte of code.

Details:

The flags in question are -x, -r, -p, -P, -s, -S, -l, and -n. It will be possible to replace each of them with a single character appearing at the beginning of the program, because there are several characters that are always a syntax error when they appear at the beginning of the program: currently, ~&*=|:<>?)]}. It's possible ? could get a unary definition at some point, so that leaves 11 characters available to cover 8 flags.

For example, suppose = replaces the -l flag. Instead a solution in Pip -l with code z@_X_.z@>_M,26, the solution in the new syntax would be =z@_X_.z@>_M,26. In the original conception of Pip scoring, this would have been counted as 14+1 bytes; now it will be counted as 15 bytes.

Another possibility would be to put the I/O format specifier at the end of the program, or possibly input specifiers at the beginning and output specifiers at the end. Placement at the end of the program has the advantage that there are a lot more characters that would work: ~!@#$%^&*-+=|\:'<>?,./({[ are all syntax errors at the end of the program. The example above would become z@_X_.z@>_M,26= in this case.

As part of this change, it may make sense to also change the default List output formatting to match the repr (currently -p flag) and require an I/O format specifier for "concatenate all values." I'm not sure whether this would affect more answers negatively or positively, since challenges that ask for a list of outputs are not uncommon; but in any case, it would cause less surprise for new users and be easier to read while working on a solution.

This change breaks backwards compatibility, especially if the default format is changed. Therefore, it will not happen any earlier than version 2.0.

Possible mapping from flags to I/O configuration characters, if the config is at the beginning of the code:

--  |
-l  =
-p  (new default)
-P  *
-s  &
-S  :
-n  ~
-r  >
-x  )

Since -rl is a somewhat common combination, perhaps it could be represented by < (which would also avoid any scanning issues with >= being an operator).


Possible mapping from flags to I/O configuration characters, if the config is at the end of the code:

--  .
-l  =
-p  (new default)
-P  (maybe don't keep; for outputting, this is basically just P*)
-s  |
-S  +
-n  (maybe don't keep)
-r  >
-x  !

The fact that there's more characters available means it would be easy to add new options: ~ for -rl; , for join with commas (or comma-space); @ for pretty-print; # for display as grid with vertically aligned columns; * for run program on each line of stdin; etc.


Adopting a hybrid strategy is also possible...

Beginning-of-code config (affecting input and execution):

-r  =
-x  &
-rl ~
    * (run on each line)

End-of-code config (affecting output):

--  .
-l  =
-s  |
-S  +
    , (comma-separated)
    @ (pretty-print)
    # (grid)

Potential problems for config characters at the beginning of the program:

  • Valid programs can in fact start with | if they begin with one of the unary operators ||, |>, or |<.
  • Valid programs can start with * if they begin with the unary operator ** (though I am thinking of phasing this operator out).
  • Valid programs can start with < if they begin with one of the unary operators <| or <>.

Potential problems for config characters at the end of the program:

  • Valid programs can end with any of $'[( due to regex match variables. Other two-byte variables starting with $ could potentially be added in the future.
  • Valid programs can actually end with any character due to character literals (or comments, for that matter).

I think the best approach is to scan the program first and then check if the first/last token is a config character. This militates in favor of putting config characters at the end, since programs can't end with operators but they can begin with them. E.g. | at the beginning followed by |> would scan incorrectly as || followed by >, but | at the end would be unambiguous.