borkdude/edamame

Continue parsing after finding error

NoahTheDuke opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
It's possible to recover from encountering mismatched brackets using :edamame/expected-delimiter but not for other kinds of errors. I would like to be able to recover from other kinds of errors to continue parsing. I encountered this with non-octal numbers padded with zeros (08 and 09), but it would be helpful in all other contexts as well (keywords or symbols with multiple /, maps with uneven number of entries, etc).

Describe the solution you'd like
Some mechanism to provide a "fix" and continue to parse. This could be a flag set in the parser when it's first called, or it could be an alternate code flow, or it could even be side-effecting top-level function to alter the state of all parsers (like (set! *warn-on-reflection* true)), or the problems could be accrued in some sort of "broken state" map and returned alongside the correct code. Here are a couple ideas for how to solve this, after thinking about it for 5 seconds:

Idea: Errors could be thrown with the correctly parsed parts and the broken parts attached in some fashion. For example, uneven maps could be, :edamame/uneven-map {:map {:a 1 :b 2} :leftovers :c} and duplicates could be, {:edamame/duplicate-map-entry {:map {:a 1 :b 2} :duplicates [{:a 3}]}}. This would allow for granularity in how each is tackled.

Idea: Errors in code could be replaced with gensym-like keywords so they can be replaced as desired. For example, (parse-string-all "(list 1 2 08 {:a 1 :b 2 :c})" {:gather-errors true}) would return [[(list 1 2 :edamame/error-1 :edameme/error-2)] {:edamame/errors-1 {:type :edamame/incorrect-number :string "08"} :edamame/errors-2 {:type :edamame/uneven-map :string "{:a 1 :b 2 :c}"}}].

Idea: Error fixing functions can be included in the parser options so throw if function doesn't return a non-nil value: (parse-string-all "(list 1 2 08 {:a 1 :b 2 :c})" {:incorrect-number (fn [s] (when (str/starts-with s "0") (subs s 1)) :uneven-map (fn [entries] (conj entries :splint/missing-value))} would return [(list 1 2 8 {:a 1 :b 2 :c :splint/missing-value})].

Describe alternatives you've considered

  1. Do nothing. Can't know what was intended so must exist immediately.
  2. Don't fix the parsing state, just delete the offending token and move on.

Additional context
The goal is to be able to analyze a whole file and provide feedback even when it's not exactly correct, because it's still worthwhile to check the rest of the file. It's annoying to only see one broken piece of code at a time, instead of being able to review/fix them all at once.

Lots of ideas and possibilities here. It would help (and save time) if you could make a table of anything that could go wrong during parsing (e.g. unbalanced parens, uneven amount of key/vals in map, duplicate set elements) and how this would be solved on a case by case basis (and/or by configuration).

Excluding all of the feature throws ("Syntax quote not allowed." etc) and unmatched delimiters (those are already handled):

fn msg kind
read-num Invalid number :invalid-char
parse-string EOF while reading, expected X to match :eof
parse-to-delimiter EOF while reading, expected X to match :eof
read-regex-pattern Error while parsing regex :eof
parse-set X literal contains duplicate key :duplicate
parse-first-matching-condition Feature should be a keyword :invalid-type
parse-first-matching-condition EOF while reading, expected X :eof
read-symbol Invalid symbol :invalid-char
parse-namespaced-map namespaced map must specify a namespace :syntax
parse-sharp Unexpected EOF :eof
parse-sharp EOF while reading :eof
parse-map Map literal contains odd forms. :uneven-pairs
parse-keyword Invalid token :invalid-char
dispatch EOF while reading :eof
  • :eof is hard to know how to handle. Maybe punt for now for simplicity (shouldn't happen during most usages).
  • :invalid-char is my catch-all for "used a wrong character for one of the literals": disallowed letters in a number, disallowed characters in a keyword or symbol, etc. Maybe the string so far and the type ({:type :symbol :string "cool-symbol:"}) could be passed to a provided function and the valid type must be returned.
  • :duplicate is pretty easy: if the fixer fn is provided, pass the vector of forms to the function. let it create a valid object of the required type.
  • :syntax felt less specific than :namespaced-map-error, but it's the only one lol. I don't know how to fix this except to pass the string plus following map to the function and let it fix it. Maybe punt? Most people don't use namespaced maps.
  • :uneven-pairs can be fixed in the same way as :duplicate: pass the vector of forms to the provided function and then assert it returns a valid map.

What do you think?