clojure-mode locks up when font-locking strings with many escaped characters
plexus opened this issue · 3 comments
Steps to reproduce the problem
Run cider-pprint-eval-last-sexp on something that returns a large string (mine was about ~140k characters), with multiple escaped characters.
This completely freezes up my Emacs, the only way to get out of it is with a killall -SIGUSR2 emacs, which then gives you a stracktrace of where it was at the time.
Debugger entered--entering a function:
* #f(compiled-function () #<bytecode 0x22a293b5c0344f>)()
clojure-string-start(t)
clojure--font-locked-as-string-p()
clojure-font-lock-escaped-chars(149240)
font-lock-fontify-keywords-region(1 149240 nil)
font-lock-default-fontify-region(57 557 nil)
apply(font-lock-default-fontify-region 57 557 nil)
#f(compiled-function (beg end &rest rest) #<bytecode 0x1cea7771e9712d09>)(57 557 nil)
font-lock-fontify-region(57 557)
#f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>)(font-lock-fontify-region)
run-hook-wrapped(#f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>) font-lock-fontify-region)
jit-lock--run-functions(57 557)
jit-lock-fontify-now(57 557)
jit-lock-function(57)
redisplay_internal\ \(C\ function\)()
Having a look at those top few functions, it seems that clojure-font-lock-escaped-chars walks through the string
(defun clojure-font-lock-escaped-chars ...
...
(while (and (not found)
(re-search-forward "\\\\." bound t))
(setq found (clojure--font-locked-as-string-p)))For each backslach it calls clojure--font-locked-as-string-p, which calls clojure-string-start (twice if it's not a regex), which does a backwards regex search
(defun clojure-string-start ...
...
;; Find a quote that appears immediately after whitespace,
;; beginning of line, hash, or an open paren, brace, or bracket
(re-search-backward "\\(\\s-\\|^\\|#\\|(\\|\\[\\|{\\)\\(\"\\)")I'm not familar enough with this code or with font-locking to judge exactly what it's doing, but it does look to be doing something rather inefficient. Maybe clojure-font-lock-escaped-chars can precompute the end of the string, instead of checking the font-locking to see if it's iterated past the end of the string? Or does clojure--font-locked-as-string-p need to clojure-string-start check? it seems to check if it's font locked as string AND inside a string, which seems a bit redundant.
Environment & Version information
clojure-mode version
Include here the version string displayed by M-x clojure-mode-display-version. Here's an example:
;; Version: 5.13.0-snapshot
Emacs version
GNU Emacs 28.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0) of 2021-02-19
Operating system
Description: Ubuntu 20.04.2 LTS
@plexus Can you see if this fixes your issue? The function was indeed written in a strange way, I couldn't figure out why from the commit history. Emacs has an issue in general with long strings, it might not even be a clojure-mode problem.
(defun clojure-string-start (&optional regex)
"Return the position of the \" that begins the string at point.
If REGEX is non-nil, return the position of the # that begins the
regex at point. If point is not inside a string or regex, return
nil."
(when (nth 3 (syntax-ppss)) ;; Are we really in a string?
(let* ((beg (nth 8 (syntax-ppss)))
(hash (eq ?# (char-before beg))))
(if regex
(and hash (1- beg))
(and (not hash) beg)))))This does seem to help tremendously!
To try this at home, do a cider-pprint-eval-last-sexp on
(slurp "https://www.youtube.com/playlist?list=PLZdCLR02grLoeorpcxc4vZKWjWI_O2KXf")With this patch it remains pretty usable, without it you have to kill or SIGUSR2 emacs.