clj-commons/rewrite-clj

Location is not always populated in parse exceptions

Opened this issue · 3 comments

Version
1.1.47

Platform
All

Symptom
The :row and :col map entries are not always populated in parsing exceptions data.

Exhibit 1
Case where :row and :col are absent for exception data:

(require '[rewrite-clj.parser :as p])

(try
  (p/parse-string-all "goodsym badsym/ oksym")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "Invalid symbol: badsym/.",
;;     :data {:type :reader-exception, :ex-kind :reader-error}}

Exhibit 2
Case where position is available in exception data, but notice :line instead of :row.

(try
  (p/parse-string-all ":goodkw :badkw: :okkw")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "[line 1, col 16] Invalid keyword: badkw:.",
;;     :data
;;     {:type :reader-exception, :ex-kind :reader-error, :file nil, :line 1, :col 16}}

Expected behavior
Here's a case that works as expected:

(try                   
  (p/parse-string-all "#:{:a 1}")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "namespaced map expects a namespace [at line 1, column 3]",
;;     :data {:msg "namespaced map expects a namespace", :row 1, :col 3}}

Diagnosis
For default token handling, we delegate to clojure.tools.reader's read-string in these cases, row/col info is not available.
In the case of the keyword, we internalized and used clojure.tools.reader read-keyword and it throws using clojure.tools.reader throw mechanisms.

Action
I'll take a peek.

It's probably worth studying, absent of rewrite-clj, what positional data tools.reader v1.4.2 throws.

(require '[clojure.tools.reader.edn :as edn]
         '[clojure.tools.reader.reader-types :as rt])


(defn read-all [s]
  (let [rdr (-> s
                rt/string-push-back-reader
                rt/indexing-push-back-reader)]
    (loop [acc []]
      (let [n (edn/read {:eof nil} rdr)]
        (if n
          (recur (conj acc n))
          acc)))))

1 col after the problematic token:

          ;123456789
(read-all "boo bah/ bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 9] Invalid symbol: bah/.

But when at eof, col at the end of the problematic token:

          ;1234
(read-all "bah/")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Invalid symbol: bah/.

Unclosed vector at eof, col at end of last item:

          ;1234
(read-all "[boo")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.

...but not consistently, here we are 1 col after eof:

          ;1234567
(read-all "[boo  ")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 7] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.

1 char after invalid keyword:

          ;12345678901
(read-all ":boo :bah: :bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 11] Invalid keyword: :bah:.

1 char after eof for unterminated string at eof:

          ;1 234
(read-all "\"ab")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Unexpected EOF reading string starting ""ab.

At end col of invalid number:

          ;123
(read-all "42Z")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 3] Invalid number: 42Z.

          ;1234
(read-all "42Z2")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Invalid number: 42Z2.

Namespaced map errors might try to be a little more targeted, but still seems off by 1 col:

          ;1234567
(read-all "#:foo[fah]")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 7] Namespaced map with namespace foo does not specify a map.

Note that the above does not imply what types of exceptions rewrite-clj does/will throw, it is just a glance at what tools.reader does today.

So tools.reader points you at the end of the problematic token/code and sometimes 1 col after the problematic token/code and sometimes 1 col after eof. Which is probably "good enough" for it.

Since rewrite-clj is carefully adding positions at parse time, I think I'll explore the possibility of doing better.
I'm thinking it might be better to point at the start of the problematic token/code. But I'll also explore including both start and end positions. I'll keep in mind that this would technically be a breaking change for folks who rely on locations in exception data.

One thing to note is that I am talking about parsing only here.
Tools.reader is also employed when sexpr is called on a string node.
I don't have plans to provide positional data on throws for sexpr.

Note that work like this has also been done in the rewrite-clj + tools reader fork in clj-kondo

Oh handy to know, thanks!