clojure-emacs/clojure-mode

clojure-mode locks up when font-locking strings with many escaped characters

plexus opened this issue · 3 comments

Steps to reproduce the problem

Run cider-pprint-eval-last-sexp on something that returns a large string (mine was about ~140k characters), with multiple escaped characters.

This completely freezes up my Emacs, the only way to get out of it is with a killall -SIGUSR2 emacs, which then gives you a stracktrace of where it was at the time.

Debugger entered--entering a function:
* #f(compiled-function () #<bytecode 0x22a293b5c0344f>)()
  clojure-string-start(t)
  clojure--font-locked-as-string-p()
  clojure-font-lock-escaped-chars(149240)
  font-lock-fontify-keywords-region(1 149240 nil)
  font-lock-default-fontify-region(57 557 nil)
  apply(font-lock-default-fontify-region 57 557 nil)
  #f(compiled-function (beg end &rest rest) #<bytecode 0x1cea7771e9712d09>)(57 557 nil)
  font-lock-fontify-region(57 557)
  #f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>)(font-lock-fontify-region)
  run-hook-wrapped(#f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>) font-lock-fontify-region)
  jit-lock--run-functions(57 557)
  jit-lock-fontify-now(57 557)
  jit-lock-function(57)
  redisplay_internal\ \(C\ function\)()

Having a look at those top few functions, it seems that clojure-font-lock-escaped-chars walks through the string

(defun clojure-font-lock-escaped-chars ...
  ...
  (while (and (not found)
    (re-search-forward "\\\\." bound t))
    (setq found (clojure--font-locked-as-string-p)))

For each backslach it calls clojure--font-locked-as-string-p, which calls clojure-string-start (twice if it's not a regex), which does a backwards regex search

(defun clojure-string-start ...
  ...
        ;; Find a quote that appears immediately after whitespace,
        ;; beginning of line, hash, or an open paren, brace, or bracket
        (re-search-backward "\\(\\s-\\|^\\|#\\|(\\|\\[\\|{\\)\\(\"\\)")

I'm not familar enough with this code or with font-locking to judge exactly what it's doing, but it does look to be doing something rather inefficient. Maybe clojure-font-lock-escaped-chars can precompute the end of the string, instead of checking the font-locking to see if it's iterated past the end of the string? Or does clojure--font-locked-as-string-p need to clojure-string-start check? it seems to check if it's font locked as string AND inside a string, which seems a bit redundant.

Environment & Version information

clojure-mode version

Include here the version string displayed by M-x clojure-mode-display-version. Here's an example:

;; Version: 5.13.0-snapshot

Emacs version

GNU Emacs 28.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0) of 2021-02-19

Operating system

Description:    Ubuntu 20.04.2 LTS

@plexus Can you see if this fixes your issue? The function was indeed written in a strange way, I couldn't figure out why from the commit history. Emacs has an issue in general with long strings, it might not even be a clojure-mode problem.

(defun clojure-string-start (&optional regex)
  "Return the position of the \" that begins the string at point.
If REGEX is non-nil, return the position of the # that begins the
regex at point.  If point is not inside a string or regex, return
nil."
  (when (nth 3 (syntax-ppss)) ;; Are we really in a string?
    (let* ((beg (nth 8 (syntax-ppss)))
           (hash (eq ?# (char-before beg))))
      (if regex
          (and hash (1- beg))
        (and (not hash) beg)))))

This does seem to help tremendously!

To try this at home, do a cider-pprint-eval-last-sexp on

(slurp "https://www.youtube.com/playlist?list=PLZdCLR02grLoeorpcxc4vZKWjWI_O2KXf")

With this patch it remains pretty usable, without it you have to kill or SIGUSR2 emacs.

Thanks a lot @yuhan0 for figuring out the fix and @bbatsov for reviewing and merging! This is something I would run into occasionally every few weeks or months, and every time it made me want to throw my computer into the ocean. Super happy to see this fixed!