lastquestion/explain-pause-mode

Why does `xterm--report-background-handler` and `version-handler` take 2 seconds to return on at least iterm2 and alacritty? Investigate emacs-bug. (was Doesn't detect pause when starting "emacsclient -nw")

nolanl opened this issue · 17 comments

I'm trying to track down a 2 second pause where running "emacsclient -nw SOMEFILE" shows emacs immediately (either on the scratch buffer or whatever buffer I was last interacting with in another window/tty) before it shows SOMEFILE.

I start explain-pause-mode in init.el at "emacs --daemon" start time. I tried to open a file 6 times in a row, but explain-pause-top just showed some normal ~70ms GC pauses, nothing close to 2000ms.

Dunno if this use-case is intended or even possible...

Yes, that is the intended use case. But not all the file IO code or the buffer init code OR the xterm init code is captured yet. Also, there’s a lot of native frames that are currently hidden because I don’t have a good way to display them, and probably your 2 secs might be found there IF it’s covered.

Undocumented feature exists where you can grab the event steam live and that might help.

Do

(explain-pause-log-to-socket "./some-socket-file-you-know-where-it-is")
(explain-pause-mode t) ;; turn the mode on

Before doing that, somewhere else in bash, run

socat -u UNIX-RECV:./that-file-you-named-earlier STDOUT

This will print a live stream of every command that your Emacs is running, even before executing the command.

If you can’t find the time there, it must be somewhere where I haven’t captured yet. I too get slow file opens sometimes and I really want this use case to work so I’m happy to investigate w you if you’re game!

Ran "socat -u UNIX-RECV:/tmp/sock STDOUT | ts | tee ~/slow.log" in one terminal, and "emacsclient -e '(explain-pause-log-to-socket "/tmp/sock")'; e customize.el ; killall socat", then did a C-x C-c as fast as I could once the file showed up.

Jul 04 16:06:42 ("enter" "server-sentinel" "server-process-filter" "process-sentinel" nil 0 nil nil nil 3)
Jul 04 16:06:42 ("exit" "server-sentinel" "process-sentinel" 0 nil)
Jul 04 16:06:42 ("exit" "server-process-filter" "process-filter" 2 nil)
Jul 04 16:06:42 ("enter" "server-sentinel" "root-emacs" "process-sentinel" nil 0 nil nil nil 3)
Jul 04 16:06:42 ("exit" "server-sentinel" "process-sentinel" 0 nil)
Jul 04 16:06:42 ("enter" "server-process-filter" "root-emacs" "process-filter" nil 0 nil nil nil 3)
Jul 04 16:06:42 ("enter" "#<subr read-event>" "server-process-filter" "server-process-filter" t 0 nil nil nil 4)
Jul 04 16:06:43 ("enter" "auto-revert-buffers" "#<subr read-event>" "timer" nil 0 nil nil nil 2)
Jul 04 16:06:43 ("exit" "auto-revert-buffers" "timer" 0 nil)
Jul 04 16:06:43 ("enter" "undo-auto--boundary-timer" "#<subr read-event>" "timer" nil 0 nil nil nil 2)
Jul 04 16:06:43 ("exit" "undo-auto--boundary-timer" "timer" 0 nil)
Jul 04 16:06:44 ("exit" "#<subr read-event>" "server-process-filter" 1999 nil)
Jul 04 16:06:44 ("exit" "server-process-filter" "process-filter" 88 nil)
Jul 04 16:06:44 ("enter" "<bytecode> (references: (eldoc-mode global-eldoc-mode eldoc--supported-p eldoc-print-current-symbol-info))" "root-emacs" "idle-timer" nil 0 nil nil nil 5)
Jul 04 16:06:44 ("exit" "<bytecode> (references: (eldoc-mode global-eldoc-mode eldoc--supported-p eldoc-print-current-symbol-info))" "idle-timer" 0 nil)
Jul 04 16:06:44 ("enter" "show-paren-function" "root-emacs" "idle-timer" nil 0 nil nil nil 2)
Jul 04 16:06:44 ("exit" "show-paren-function" "idle-timer" 0 nil)
Jul 04 16:06:44 ("enter" "save-buffers-kill-terminal" "root-emacs" "root-emacs" nil 0 nil nil nil 1)
Jul 04 16:06:44 ("enter" "server-sentinel" "save-buffers-kill-terminal" "process-sentinel" nil 0 nil nil nil 3)
Jul 04 16:06:44 ("exit" "server-sentinel" "process-sentinel" 0 nil)
Jul 04 16:06:44 ("exit" "save-buffers-kill-terminal" "root-emacs" 0 nil)

Nothing jumps out at me.

Check it out, you spent two seconds in read-event:

Jul 04 16:06:44 ("exit" "#<subr read-event>" "server-process-filter" 1999 nil)

1999ms exactly, that's the timeout firing.

read-event documentation:

(read-event &optional PROMPT INHERIT-INPUT-METHOD SECONDS)
Read an event object from the input stream.
If the optional argument PROMPT is non-nil, display that as a prompt.
If the optional argument INHERIT-INPUT-METHOD is non-nil and some
input method is turned on in the current buffer, that input method
is used for reading a character.
If the optional argument SECONDS is non-nil, it should be a number
specifying the maximum number of seconds to wait for input. If no
input arrives in that time, return nil. SECONDS may be a
floating-point value.

Conclusion: somewhere in someone's code, it's literally waiting for some kind of input for 2 seconds. It might be trying to read keyboard input, or it might be trying to read mouse input, or maybe - and this is my suspicion - it is the read-event that is in xterm.el.

That happens here-ish, at least in emacs HEAD: https://github.com/emacs-mirror/emacs/blob/master/lisp/term/xterm.el#L764

Check out that the default value of xterm-query-timeout is 2 seconds exactly, so I suspect it's here. Before we dig further, you need to tell me which emacs version you're running 😀

But certainly, for this package, there needs to be some way to see "native frames" somehow, and especially "native frames that timeout". I'm not sure yet how to show that :( I suspect I need to implement #48 or something similar so I can generate a tree view in top before I can show native frames.

But I should also productionalize this secret feature (which I intend to do) by providing a way for a separate emacs process to read it and highlight things and stuff. You can see I intended to do that since it spits out elisp that is parseable.

GNU Emacs 26.1
from debian buster:
1:26.1+1-3.2+deb10u1

This commit in 27+ decreases the xterm wait emacs-mirror/emacs@c67befd

Before 27, there was a hardcoded wait of 2 seconds, and it was not controllable.

Suggestion:

  1. Try 27+, and at least you have configurable values for the xterm init timeout
  2. What terminal driver are you running? From initial browsing really fast, I can see it's doing fancy stuff trying to ask how many colors it supports and term caps. Also check what your TERM is and what it's advertising. Run infocmp
  1. Thanks, I'll try that and see if changing it helps.
  2. Alacritty
$ echo $TERM
xterm-256color
$ infocmp
#	Reconstructed via infocmp from file: /lib/terminfo/x/xterm-256color
xterm-256color|xterm with 256 colors,
	am, bce, ccc, km, mc5i, mir, msgr, npc, xenl,
	colors#0x100, cols#80, it#8, lines#24, pairs#0x10000,
	acsc=``aaffggiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
	bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
	clear=\E[H\E[2J, cnorm=\E[?12l\E[?25h, cr=\r,
	csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H,
	cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C,
	cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\E[A,
	cvvis=\E[?12;25h, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m,
	dl=\E[%p1%dM, dl1=\E[M, ech=\E[%p1%dX, ed=\E[J, el=\E[K,
	el1=\E[1K, flash=\E[?5h$<100/>\E[?5l, home=\E[H,
	hpa=\E[%i%p1%dG, ht=^I, hts=\EH, ich=\E[%p1%d@,
	il=\E[%p1%dL, il1=\E[L, ind=\n, indn=\E[%p1%dS,
	initc=\E]4;%p1%d;rgb\:%p2%{255}%*%{1000}%/%2.2X/%p3%{255}%*%{1000}%/%2.2X/%p4%{255}%*%{1000}%/%2.2X\E\\,
	invis=\E[8m, is2=\E[!p\E[?3;4l\E[4l\E>, kDC=\E[3;2~,
	kEND=\E[1;2F, kHOM=\E[1;2H, kIC=\E[2;2~, kLFT=\E[1;2D,
	kNXT=\E[6;2~, kPRV=\E[5;2~, kRIT=\E[1;2C, kb2=\EOE, kbs=^?,
	kcbt=\E[Z, kcub1=\EOD, kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA,
	kdch1=\E[3~, kend=\EOF, kent=\EOM, kf1=\EOP, kf10=\E[21~,
	kf11=\E[23~, kf12=\E[24~, kf13=\E[1;2P, kf14=\E[1;2Q,
	kf15=\E[1;2R, kf16=\E[1;2S, kf17=\E[15;2~, kf18=\E[17;2~,
	kf19=\E[18;2~, kf2=\EOQ, kf20=\E[19;2~, kf21=\E[20;2~,
	kf22=\E[21;2~, kf23=\E[23;2~, kf24=\E[24;2~,
	kf25=\E[1;5P, kf26=\E[1;5Q, kf27=\E[1;5R, kf28=\E[1;5S,
	kf29=\E[15;5~, kf3=\EOR, kf30=\E[17;5~, kf31=\E[18;5~,
	kf32=\E[19;5~, kf33=\E[20;5~, kf34=\E[21;5~,
	kf35=\E[23;5~, kf36=\E[24;5~, kf37=\E[1;6P, kf38=\E[1;6Q,
	kf39=\E[1;6R, kf4=\EOS, kf40=\E[1;6S, kf41=\E[15;6~,
	kf42=\E[17;6~, kf43=\E[18;6~, kf44=\E[19;6~,
	kf45=\E[20;6~, kf46=\E[21;6~, kf47=\E[23;6~,
	kf48=\E[24;6~, kf49=\E[1;3P, kf5=\E[15~, kf50=\E[1;3Q,
	kf51=\E[1;3R, kf52=\E[1;3S, kf53=\E[15;3~, kf54=\E[17;3~,
	kf55=\E[18;3~, kf56=\E[19;3~, kf57=\E[20;3~,
	kf58=\E[21;3~, kf59=\E[23;3~, kf6=\E[17~, kf60=\E[24;3~,
	kf61=\E[1;4P, kf62=\E[1;4Q, kf63=\E[1;4R, kf7=\E[18~,
	kf8=\E[19~, kf9=\E[20~, khome=\EOH, kich1=\E[2~,
	kind=\E[1;2B, kmous=\E[M, knp=\E[6~, kpp=\E[5~,
	kri=\E[1;2A, mc0=\E[i, mc4=\E[4i, mc5=\E[5i, meml=\El,
	memu=\Em, oc=\E]104\007, op=\E[39;49m, rc=\E8, rev=\E[7m,
	ri=\EM, rin=\E[%p1%dT, ritm=\E[23m, rmacs=\E(B,
	rmam=\E[?7l, rmcup=\E[?1049l\E[23;0;0t, rmir=\E[4l,
	rmkx=\E[?1l\E>, rmm=\E[?1034l, rmso=\E[27m, rmul=\E[24m,
	rs1=\Ec\E]104\007, rs2=\E[!p\E[?3;4l\E[4l\E>, sc=\E7,
	setab=\E[%?%p1%{8}%<%t4%p1%d%e%p1%{16}%<%t10%p1%{8}%-%d%e48;5;%p1%d%;m,
	setaf=\E[%?%p1%{8}%<%t3%p1%d%e%p1%{16}%<%t9%p1%{8}%-%d%e38;5;%p1%d%;m,
	sgr=%?%p9%t\E(0%e\E(B%;\E[0%?%p6%t;1%;%?%p5%t;2%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%?%p7%t;8%;m,
	sgr0=\E(B\E[m, sitm=\E[3m, smacs=\E(0, smam=\E[?7h,
	smcup=\E[?1049h\E[22;0;0t, smir=\E[4h, smkx=\E[?1h\E=,
	smm=\E[?1034h, smso=\E[7m, smul=\E[4m, tbc=\E[3g,
	u6=\E[%i%d;%dR, u7=\E[6n, u8=\E[?%[;0123456789]c,
	u9=\E[c, vpa=\E[%i%p1%dd,

It's either xterm--report-background-handler or xterm--version-handler. Directly run it in *Scratch* and see which one it is. On my machine on 26.4, (xterm--report-background-handler) takes 2 seconds and xterm--version-handler is faster but still slow as molasses. This feels like possibly a bug in Emacs which that commit "hides" - or both iTerm and Alacritty have implemented the same behavior.

It's also possible emacs is querying something that no modern term implements as that code has been there for a long time.

OK, I lied, both are timing out:

(require  'benchmark)
benchmark

(benchmark-elapse (xterm--report-background-handler))
2.000109
(benchmark-elapse (xterm--version-handler))
2.000795

on 26.1:

(benchmark-elapse (xterm--report-background-handler))
2.001629141
(benchmark-elapse (xterm--version-handler))
2.00161142

on emacs mainline:

(benchmark-elapse (xterm--report-background-handler))
2.001123153
(benchmark-elapse (xterm--version-handler))
2.001618543

xterm is very quick.

That's crazy. I get that too. Does this PR even work?! emacs-mirror/emacs@c67befd#diff-89047cdd70633a563eda4e02a5fb2e13L674

Very strange:

(require 'benchmark)
benchmark
(benchmark-elapse (xterm--report-background-handler))
2.005282

xterm-query-redisplay-timeout
0.2
xterm-query-timeout
2
(setq xterm-query-timeout 0.1)
0.1
(benchmark-elapse (xterm--report-background-handler))
0.206694

I conclude there is probably an emacs-lisp bug in that PR above. This is on 28 HEAD.

However, I think you can set both values very low and see what happens. I believe it will fix it assuming that PR even half works.

Also, I was thinking how annoying it is to recompile emacs for 28+, if you want to stay on 26 I'd honestly just define xterm--read-event-for-query to just return nil without trying to read. It's what it does anyway when it timesout.

I am very curious what exactly it is querying for, though, and I will take a look after finishing #56 ...

Wait, I understand that code now. The total time it waits is the combination of both values. So the PR does work. Just set both values to 0.01. But still doesn't explain what the hell it's querying for...and why all these modern term emulators don't reply to it.

(defun xterm--read-event-for-query () nil) works for me, thanks for taking the time to help me debug this, it has been annoying me for months. Weird terminal interactions were definitely not where I thought we'd end up here.

I saw that too. And this has been hitting me too, but I've been thinking that I've been doing some GC during startup and I was going to get "to it" once I did #27 and track GC pauses. LOL. Well thank you because it has been also frustrating me.

I am probably gonna see if I can figure out what the root cause is, and try to report a bug to emacs core... I suspect this is likely hitting lots and lots of people and it wouldn't count as part of init-time either. I see lots of posts on reddit saying "oh my init code was 0.1 seconds but I spent many seconds before emacs started up".

Maybe make a post in reddit about this and help the community? There are at least dozens of us still using -nw :p HAHA

I'm going to open new enhancement issues for the two things I outlined so it makes it easier to find and debug these issues. They were things I wanted to do already, but it's good to get affirmation they are useful features.

Thank you for trying explain-pause out and let me know if you find other bugs! I'm going to leave this open until I get time to discover with upstream what's going on....

Awesome, thanks again for the debugging help and for making explain-pause-mode!