Location is not always populated in parse exceptions
Opened this issue · 3 comments
Version
1.1.47
Platform
All
Symptom
The :row
and :col
map entries are not always populated in parsing exceptions data.
Exhibit 1
Case where :row
and :col
are absent for exception data:
(require '[rewrite-clj.parser :as p])
(try
(p/parse-string-all "goodsym badsym/ oksym")
(catch Throwable e
{:msg (ex-message e)
:data (ex-data e)}))
;; => {:msg "Invalid symbol: badsym/.",
;; :data {:type :reader-exception, :ex-kind :reader-error}}
Exhibit 2
Case where position is available in exception data, but notice :line
instead of :row
.
(try
(p/parse-string-all ":goodkw :badkw: :okkw")
(catch Throwable e
{:msg (ex-message e)
:data (ex-data e)}))
;; => {:msg "[line 1, col 16] Invalid keyword: badkw:.",
;; :data
;; {:type :reader-exception, :ex-kind :reader-error, :file nil, :line 1, :col 16}}
Expected behavior
Here's a case that works as expected:
(try
(p/parse-string-all "#:{:a 1}")
(catch Throwable e
{:msg (ex-message e)
:data (ex-data e)}))
;; => {:msg "namespaced map expects a namespace [at line 1, column 3]",
;; :data {:msg "namespaced map expects a namespace", :row 1, :col 3}}
Diagnosis
For default token handling, we delegate to clojure.tools.reader's read-string
in these cases, row/col info is not available.
In the case of the keyword, we internalized and used clojure.tools.reader read-keyword
and it throws using clojure.tools.reader throw mechanisms.
Action
I'll take a peek.
It's probably worth studying, absent of rewrite-clj, what positional data tools.reader
v1.4.2 throws.
(require '[clojure.tools.reader.edn :as edn]
'[clojure.tools.reader.reader-types :as rt])
(defn read-all [s]
(let [rdr (-> s
rt/string-push-back-reader
rt/indexing-push-back-reader)]
(loop [acc []]
(let [n (edn/read {:eof nil} rdr)]
(if n
(recur (conj acc n))
acc)))))
1 col after the problematic token:
;123456789
(read-all "boo bah/ bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 9] Invalid symbol: bah/.
But when at eof, col at the end of the problematic token:
;1234
(read-all "bah/")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 4] Invalid symbol: bah/.
Unclosed vector at eof, col at end of last item:
;1234
(read-all "[boo")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 4] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.
...but not consistently, here we are 1 col after eof:
;1234567
(read-all "[boo ")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 7] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.
1 char after invalid keyword:
;12345678901
(read-all ":boo :bah: :bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 11] Invalid keyword: :bah:.
1 char after eof for unterminated string at eof:
;1 234
(read-all "\"ab")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 4] Unexpected EOF reading string starting ""ab.
At end col of invalid number:
;123
(read-all "42Z")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 3] Invalid number: 42Z.
;1234
(read-all "42Z2")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 4] Invalid number: 42Z2.
Namespaced map errors might try to be a little more targeted, but still seems off by 1 col:
;1234567
(read-all "#:foo[fah]")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;; [line 1, col 7] Namespaced map with namespace foo does not specify a map.
Note that the above does not imply what types of exceptions rewrite-clj does/will throw, it is just a glance at what tools.reader does today.
So tools.reader points you at the end of the problematic token/code and sometimes 1 col after the problematic token/code and sometimes 1 col after eof. Which is probably "good enough" for it.
Since rewrite-clj is carefully adding positions at parse time, I think I'll explore the possibility of doing better.
I'm thinking it might be better to point at the start of the problematic token/code. But I'll also explore including both start and end positions. I'll keep in mind that this would technically be a breaking change for folks who rely on locations in exception data.
One thing to note is that I am talking about parsing only here.
Tools.reader is also employed when sexpr
is called on a string node.
I don't have plans to provide positional data on throws for sexpr
.
Note that work like this has also been done in the rewrite-clj + tools reader fork in clj-kondo
Oh handy to know, thanks!