glibc 2.33 does not change select's timeout, hence breaks topline
Closed this issue · 8 comments
SELECT (2) ...
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations
do not do this. (POSIX.1 permits either behavior.) This causes problems both when Linux code which
reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct
timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined af‐
ter select() returns
@rbalint Why do you think that select
in glibc 2.33 does not update the timeout value? The implementation has code to do exactly that. Do you see problems on specific architectures? Thanks.
Yes topline broke on all architectures with glibc 2.33 in Ubuntu CI. Amd64:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/t/topline/20210210_153219_c271c@/log.gz
autopkgtest [15:32:08]: test run-and-signal: - - - - - - - - - - results - - - - - - - - - -
run-and-signal FAIL non-zero exit status 1
There's a new bug in glibc. The code snippet is:
if (r == 0 || errno != ENOSYS)
{
if (timeout != NULL)
TIMEVAL_TO_TIMESPEC (timeout, &ts64);
return r;
}
Two pages down the result is handled:
int r = __select64 (nfds, readfds, writefds, exceptfds, ptv64);
if (r >= 0 && timeout != NULL)
/* The remanining timeout will be always less the input TIMEOUT. */
*timeout = valid_timeval64_to_timeval (tv64);
So... it depends what the (undefined!) value of errno happens to be.
Hmm, that's not what the description of this issue suggests.
On amd64, the ENOSYS
fallback code is not actually compiled in. (SYSCALL_CANCEL
in glibc sets errno
.)
SYSCALL_CANCEL
sets errno
only on failure; here we have a successful return with r>0
I don't think that's something that has changed in glibc. The old select
did not set errno
unconditionally, either.
Do you know what the run-and-signal
test is doing that is reported as failing?
@fweimer-rh The test does this basically:
$ topline sleep 2
loop(⠀⠀⠀⠀⠀⠀⠀⠀)sd(⠀)dm(⠀⠀⠀)loop(⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀) (▁ )
With glibc 2.33 the line does now show up.
I've added a minimal debugging patch on top of topline:
diff --git a/topline.c b/topline.c
index 361d02a..480a53a 100644
--- a/topline.c
+++ b/topline.c
@@ -326,13 +326,16 @@ int main(int argc, char **argv)
fds|=1<<linebuf[i].fd;
if (select(fds?NFDS:0, (void*)&fds, 0, 0, &delay)==-1)
{
+ printf("(select() == -1) delay: {%ld, %ld}\n", delay.tv_sec, delay.tv_usec);
if (errno!=EINTR)
die("select: %m\n");
}
- else for (int i=0; i<ARRAYSZ(linebuf); i++)
+ else {
+ printf("(select() != -1) delay: {%ld, %ld}\n", delay.tv_sec, delay.tv_usec);
+ for (int i=0; i<ARRAYSZ(linebuf); i++)
if (linebuf[i].fd && linebuf[i].fd<sizeof(fds)*8 && fds&1<<linebuf[i].fd)
copy_line(&linebuf[i]);
- }
+ }}
if (child_pid)
{
With glibc 2.32 this is the output:
./topline sleep 2
(select() != -1) delay: {0, 0}
loop(⠀⠀⠀⠀⠀⠀⠀⠀)sd(⠀)dm(⢀⢀⠀)loop(⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀) ( )
(select() == -1) delay: {0, 3588}
with glibc 2.33:
./topline sleep 2
(select() != -1) delay: {1, 0}
(select() == -1) delay: {1, 0}