purcell/package-lint

New `package-lint-symbol-info` causes errors with old Emacs versions

Closed this issue · 16 comments

I own the elisp-check project, which is a GitHub Action for simple to use Emacs Lisp CI. One of the checks we implement depends on package-lint.

Since my last commit, my CI for the Emacs versions 24.2 and 24.1 has been failing, ostensibly because of changes to package-lint that are not compatible with those versions of Emacs. I've included the relevant debugger output as well as a link to my failing CI.

2020-08-31T22:06:45.2260920Z Debugger entered--Lisp error: (void-variable --cl---cl-var--85100--)
2020-08-31T22:06:45.2262640Z   byte-code("\302\303!r\211q\210\304\305\306\307\310\311��!\312\"\313\314%DC\216\315\316\317�\203%\0\320�!\202&\0	\"!\210\321p!*\262�\322 \323\305\324\325\310\311��!\326\"\327\330%\262��\211\203\264\0\211@\211@�A\211��\331\332�\236A\236A\333\332�\236A\236A\331\334�\236A\236A\333\334�\236A\236A\331\335��\236A\236A\333\335��\236A\236A\336J��\337�\n#\210\336J�\340�\n#\210\336J�\341�\n#\210\336J�\342�\n#\210\336J�\343�\n#\210\336J�\344�\n#\266\f�A\266\202\202B\0\266�\207" [load-file-name default-directory generate-new-buffer " *temp*" funcall make-byte-code 0 "\301\300!\205	\0\302\300!\207" vconcat vector [buffer-name kill-buffer] 2 "\n\n(fn)" insert-file-contents expand-file-name "data/stdlib-changes" file-name-directory read make-hash-table nil 771 "�\211\205�\0\211@\301���B\302�\300\"B\300#\210�A\266\202\202�\0\207" [puthash gethash] 11 "\n\n(fn SYMS ACTION VERSION)" added variables removed features functions --cl---cl-var--85100-- variable-added variable-removed library-added library-removed function-added function-removed] 20)
2020-08-31T22:06:45.2264436Z   (defconst package-lint-symbol-info (byte-code "\302\303!r\211q\210\304\305\306\307\310\311��!\312\"\313\314%DC\216\315\316\317�\203%\0\320�!\202&\0	\"!\210\321p!*\262�\322 \323\305\324\325\310\311��!\326\"\327\330%\262��\211\203\264\0\211@\211@�A\211��\331\332�\236A\236A\333\332�\236A\236A\331\334�\236A\236A\333\334�\236A\236A\331\335��\236A\236A\333\335��\236A\236A\336J��\337�\n#\210\336J�\340�\n#\210\336J�\341�\n#\210\336J�\342�\n#\210\336J�\343�\n#\210\336J�\344�\n#\266\f�A\266\202\202B\0\266�\207" [load-file-name default-directory generate-new-buffer " *temp*" funcall make-byte-code 0 "\301\300!\205	\0\302\300!\207" vconcat vector [buffer-name kill-buffer] 2 "\n\n(fn)" insert-file-contents expand-file-name "data/stdlib-changes" file-name-directory read make-hash-table nil 771 "�\211\205�\0\211@\301���B\302�\300\"B\300#\210�A\266\202\202�\0\207" [puthash gethash] 11 "\n\n(fn SYMS ACTION VERSION)" added variables removed features functions --cl---cl-var--85100-- variable-added variable-removed library-added library-removed function-added function-removed] 20))
2020-08-31T22:06:45.2264944Z   require(package-lint)
2020-08-31T22:06:45.2265248Z   mapc(require (bytecomp package-lint checkdoc))
2020-08-31T22:06:45.2265540Z   elisp-check-run("melpa" "elisp-check.el" t)
2020-08-31T22:06:45.2265834Z   eval((elisp-check-run "melpa" "elisp-check.el" t))
2020-08-31T22:06:45.2266391Z   command-line-1(("--eval" "(setq debug-on-error t)" "--load" "/home/runner/work/elisp-check/elisp-check/elisp-check.el" "--eval" "(setq elisp-check-ignore-warnings t)" "--eval" "(setq elisp-check-warnings-as-errors nil)" "--eval" "(elisp-check-run \"melpa\" \"elisp-check.el\" t)"))
2020-08-31T22:06:45.2266701Z   command-line()
2020-08-31T22:06:45.2266955Z   normal-top-level()
2020-08-31T22:06:45.2267021Z 

I am by no means an expert on Emacs Lisp internals, but from what I can gather the new package-lint-symbol-info variable is the likely culprit here. It seems a bit strange that this error occurs in my CI setup but not yours, so maybe we can work together to find out what causes this discrepancy.

(defconst package-lint-symbol-info
(let* ((stdlib-changes (with-temp-buffer
(insert-file-contents
(expand-file-name "data/stdlib-changes"
(if load-file-name
(file-name-directory load-file-name)
default-directory)))
(read (current-buffer))))
(info (make-hash-table)))
(cl-labels ((add-info (syms action version)
(dolist (sym syms)
(puthash sym (cons (cons action version) (gethash sym info)) info))))
(pcase-dolist (`(,version . ,info) stdlib-changes)
(let-alist info
(add-info .variables.added 'variable-added version)
(add-info .variables.removed 'variable-removed version)
(add-info .features.added 'library-added version)
(add-info .features.removed 'library-removed version)
(add-info .functions.added 'function-added version)
(add-info .functions.removed 'function-removed version))))
info))

Thank you for creating package-lint!

As you say, it's strange that we don't see this in our own CI setup. My first guess based on the backtrace would be that you're loading incompatible bytecode.

My Emacs Lisp code simply fetches the latest version of package-lint from MELPA using package.el. I also add the ELPA and ORG ELPA package archives, but this should be irrelevant because they don't contain package-lint.

Your CI output says:

Checking /home/runner/.emacs.d/elpa/package-lint-20200816.24/data

just before the error. I think that's a clue, because perhaps something is expecting it to be executable elisp (thought it isn't). I wonder if we need to put a comment in that file to force the byte compiler to ignore it...

Maybe #189 will make a difference for you, but it's just a guess. Perhaps give it another try when MELPA has rebuilt.

Your change unfortunately does not fix my issue. In fact, the line about checking the non-executable Emacs Lisp file still persists.

I did confirm that MELPA had rebuilt package-lint with your new commit before rerunning my CI setup.

Hmm. This is a mystery to me, then, sorry. I haven't seen any sign of this outside your setup.

That does surprise me, I suppose very few people care about supporting Emacs 24 nowadays.

In any case, I have been able to verify that the following minimal example (executed locally using Emacs 24.1 built using your own nix-emacs-ci project) results in the same error. I have tested both loading the file in batch mode and running eval-buffer in interactive mode.

;;; -*- lexical-binding: t; -*-

(package-initialize)
(add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/"))
(package-refresh-contents)
(package-install 'package-lint)
(require 'package-lint)

;;; test.el ends here

I think that should rule out the possibility of the issue stemming from any weirdness in my CI setup.

When you run that locally, are you also doing anything to make sure it isn't re-using an ~/.emacs.d/elpa directory containing a package-lint.elc compiled with a newer version?

When you run that locally, are you also doing anything to make sure it isn't re-using an ~/.emacs.d/elpa directory containing a package-lint.elc compiled with a newer version?

This is a good point. It would be safer to set a different package-user-dir value for each Emacs version (at least for each major version), like this: https://github.com/kaushalmodi/.emacs.d/blob/master/early-init.el

Yep - I've reproduced with locally with a separate package-user-dir. To be honest, the fact that this works in every other version indicates to me that cl-lib 0.6.1 in ELPA isn't fully compatible with Emacs 24.1.

In #190 I've changed our CI so that it reproduces this issue. I'm still not sure why it happens, though. The code can happily be evaluated in Emacs 24.1, just not loaded from its compiled equivalent.

I didn't exactly figure out why it was breaking, but I reformulated that portion of the code to avoid cl-labels, and it seems fine now: see #190.

Thanks, your help and work on package-lint is much appreciated!

It would be safer to set a different package-user-dir value for each Emacs version (at least for each major version), like this: https://github.com/kaushalmodi/.emacs.d/blob/master/early-init.el

Even beyond this, it's worth noting that package-lint should give the same results regardless of the host Emacs inside which it is run, e.g. even in old Emacsen it should be able to warn you about use of symbols added in Emacs 27.1.

However, generally speaking, we ideally shouldn't have to keep package-lint working in every Emacs version -- it would therefore be good practice (and also more efficient) for users to run it once in a recent Emacs during their CI process, rather than incorporate it into the runs for all the Emacs versions.

I've set a bad example in this regard when I've set up CI in my own projects, because it's easier to just run package-lint unconditionally, but it's something to consider.

It would therefore be good practice (and also more efficient) for users to run it once in a recent Emacs during their CI process, rather than incorporate it into the runs for all the Emacs versions.

In this regard, I will rethink about how elinter should work. It currently runs both linting and byte-compilation on every Emacs version, but it would be possible to run linting only on the latest version. Also, checkdoc, which is shipped with Emacs, can be incompatible between different Emacs versions, so it would make sense to skip it on old versions.

Nonetheless, the cost of static checking is negligible compared to that of installing Emacs and package dependencies, so the current common practice of elisp CI is actually not bad. It would be more costly to add another phase of installing Nix and Emacs.

Yep, totally agree. Just wanted to mention it because I know this topic is something you're thinking about. I'm not thinking about dropping backwards compatibility for package-lint in the foreseeable future, but I guess at a certain point, if package-lint were able to use a new feature in Emacs to give significantly more useful feedback, we'd definitely consider doing so.