golang/go

runtime: windows system clock resolution issue

defia opened this issue · 96 comments

What does 'go version' print?

go version go1.3.1 windows/amd64

actually it's just the same as the issue in chrome:
https://code.google.com/p/chromium/issues/detail?id=153139

What steps reproduce the problem?

1.download and run system clock resolution tool provided by the link in
https://code.google.com/p/chromium/issues/detail?id=153139
it will show your current timer interval
2.if current timer interval smaller than 15.625ms,try to close running programs until it
goes to 15.625ms
3.run any program compiled by go

What happened?
current timer interval goes to 1ms when go compiled bins running,and goes to 15.625
after quiting

What should have happened instead?
current timer interval should not change

at least,with no time package used, the timer setting should not change, or there should
be a method to control this

Comment 1:

Labels changed: added repo-main, release-go1.4, os-windows.

Comment 2:

I confirm what DefiaQ said.
The problem is that we call timeBeginPeriod Windows API. It was introduced as part of
pprof implementation
changeset:   9786:9c5c0cbadb4d                           
user:        Hector Chu <hectorchu@gmail.com>            
date:        Sat Sep 17 17:57:59 2011 +1000              
summary:     runtime: implement pprof support for windows
but I view it as "windows timer precision is very very low (about 15ms) comparing to
other OSes, so lets get best precision whenever we can (1ms)".
timeBeginPeriod has many problems. It does not work on some systems (it does not work on
my oldish windows xp). I suspect, it fails, if you don't have correct access rights on
the system (it works for administrator). When successfully set timeBeginPeriod will
drain laptop batteries faster (I read about it everywhere, I didn't have chance to
experience it on practice). But, I suspect, many other applications call timeBeginPeriod
too, so Go is not the only one at fault here.
Personally I don't care if timeBeginPeriod is set, but others might feel different.
Perhaps there is case here for some way to control it from within Go program.
ALternatively, people who care, can just remove timeBeginPeriod call in their own
runtime.
Alex

Comment 3:

it will succeed even without administrator privilege.
so far i didn't find any applications other than chrome change timer precesion

Comment 4:

Well, I am not running chrome but my timer interval is set to 1ms.
Also, why did you said "... if current timer interval smaller than 15.625ms,try to close
running programs until it goes to 15.625ms .. "? You should have said " ... close chrome
...".
Alex

Comment 5:

i also suspect other applications will change the setting , though i havn't found any so
far
rsc commented

Comment 6:

Too late for 1.4. Maybe for a future release, if someone wants to do the work to prove
that it matters and that removing this is safe. I think without a clear report about bad
implications we won't do that work ourselves.

Labels changed: added release-none, removed release-go1.4.

Status changed to Accepted.

i made a package to call timeEndPeriod to reset the timer to normal on initialization.

if the program doesn't need much performance, but runs on a windows laptop for a long time, just import the package.
i've using this for some days, everything seems to be well so far

I've done some benchmarks today.
At first I ran a cpu-heavy program and modified it to use hundreds of goroutines. To my surprise,it actually ran ~5% faster if I set the system clock resolution to 15.6ms(that is default value)

Then I ran this benchmark
I didn't look into the code, I guess it use only one goroutine, still pure cpu thing, and get still the same result.

Then I wrote a http benchmark here
This time it ran a lot faster(~50%) with 1ms system clock resolution.

So I guess, if your program don't use much IO (cause less context switches of goroutine), you can discard the system clock resoluton change.

Could we use QueryPerformanceCounter for pprof support?

And how are you going to use QueryPerformanceCounter in pprof support?

Alex

cyclesPerSecond uses systime in os1_windows.go

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L314-L316

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L286-L306

This part is possible to be replaced with QueryUnbiasedInterruptTime (not QueryPerformanceCounter).

I've blogged about this issue in the past (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) and I wanted to share what I have learned:

As defia pointed out, having a higher timer frequency wastes CPU time and can make some CPU heavy programs run more slowly. This is because the OS is waking up ~15 times more frequently, consuming CPU time each time it does. It also means that other programs may wake up more frequently and waste CPU time - some software sits in Sleep(1) loops and will wake up ~15 times more frequently when the timer resolution is changed. So, raising the timer frequency can waste performance.

Raising the timer frequency will almost always waste some power and will harm battery life. This is the kicker. This is why no program should ever raise the timer frequency unless it needs the timer frequency raised. The timer frequency is a global setting and raising it without a good reason is poor citizenship.

If you have a work loop that relies on Sleep(1) or timeouts from WaitForSingleObject(handle, 1) then raising the timer frequency will make your loop run faster. Presumably that is why the http test ran faster. Most software should not be relying on timeouts, it should be running based on events. But, it is reasonable for specific go programs to raise the timer frequency if needed.

It is not reasonable for the go runtime to unilaterally raise timer frequency.

With the background information out of the way...

I discovered this issue because I noticed (using clockres.exe) that the timer frequency on my computer was always raised. I used "powercfg /energy /duration 5" to identify the culprit, a go program that was running in the background. This go program was monitoring system status and was idle about 99.9% of the time but because of the go runtime's decision to raise the timer frequency for all go programs this background program was measurably affecting power consumption and battery life. This 99.9% idle go program was affecting my computer more than a program that was actively running.

This was clearly a case where the program did not need a higher timer frequency. Raising the timer frequency is a reasonable option for the go runtime, but it is absolutely not a reasonable default for the go runtime.

Chrome has been fixed to not leave the timer frequency high, so has SDL, and so have many other products. Go needs to do likewise. Please.

@randomascii

I appreciate and share your concerns. So I tried to measure the impact of removing timeBeginPeriod call on my windows 7 amd64 computer. (see bench.txt in https://gist.github.com/alexbrainman/914885b57518cb4a6e39 ). Unfortunately I can see quite significant slowdowns here and there. The change looks like a no go for me. Maybe gophers who know more about Go scheduler can explain the slowdowns and suggest ways to improve the situation.

For the moment, if someone cares about this, they can comment out call to timeBeginPeriod.

Alex

I know that some video applications (games, Chrome, a few others) have to leave the frequency high in order to accurately schedule tasks and meet frame rates. For other programs I don't understand the need to have the timer frequency high. I would imagine that tasks would typically be happening as quickly as possible (reading files, doing calculations, passing messages) and I'm curious as to what sort of timeouts, or waits, or sleeps are being affected by the timer frequency.

Anyway, if some Go programs need the timer frequency then those Go programs should raise the timer frequency. I have no problem with that. But if the Go runtime raises the timer frequency then that is a problem. We could debate whether raising the timer frequency should be opt-in (my strong vote) or opt-out, but it should be a choice. Otherwise long-running low-activity Go programs will be blacklisted by anybody who wants good battery life.

Go deserves better than that.

I would imagine that tasks would typically be happening as quickly as possible ...

Things are never simple, it is always about trade-offs. But I don't know enough about Go runtime to give you explanation. And I am hopeful that other gophers will comment. All I can say is that Windows makes the task much harder when everything is measured in 15 milliseconds (comparing to nanoseconds on linux) - imagine running scheduler that performs equally well on both. But maybe Go runtime can be improved on windows, and we can disable timeBeginPeriod call.

Alex

FWIW, an ETW trace (see https://github.com/google/UIforETW/releases for the easiest way to record one) can be used to record, among other things, all context switches. This makes it possible to determine where applications are waiting.

The Windows timer resolution is a continuing source of frustration to be sure, but it only slows down code that calls Sleep(n) or WaitForSingleObject(handle, n) (where 'n' is not INFINITE).

Looking at bench.txt I see that *OneShotTimeout-2 runs slower with a lower timer frequency. That sounds like it is by-design and shouldn't be used to justify all Go programs running with a high timer frequency. Ditto with BenchmarkGoroutineIdle-2.

BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.

I strongly suspect that the latter is the problem. timeGetTime's precision is increased when the timer frequency is increased. The benchmarks with the lower timer precision are mostly not slower they are just less accurate. The correct fix is not to raise the timer frequency (the timers will still be inaccurate) but to use accurate timers such as QueryPerformanceCounter.

It is a frustrating problem I'll agree, but raising the timer frequency so that benchmarks can measure the execution time better is not a great solution.

If it turns out that the benchmarks such as BenchmarkAppendGrowByte-2 are actually slower when the timer frequency is lower then I will be surprised and might actually investigate. Fixing slow benchmarks requires understanding why they are slow.

@randomascii I agree, we need to understand why benchmarks are slow. I looked at BenchmarkAppendGrowByte, and it does not do much - appending bytes to a slice. How do you propose I understand why BenchmarkAppendGrowByte is slow?

For the record, I started with d5fe165. I ran:

go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >old

command. Then I made this change:

diff --git a/src/runtime/os1_windows.go b/src/runtime/os1_windows.go
index f608b4a..be5a4b2 100644
--- a/src/runtime/os1_windows.go
+++ b/src/runtime/os1_windows.go
@@ -154,7 +154,7 @@ func osinit() {

        stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1)

-       stdcall1(_timeBeginPeriod, 1)
+       //stdcall1(_timeBeginPeriod, 1)

        ncpu = getproccount()

and I ran the same test again:

go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >new

And then I compared results with rsc.io/benchstat command:

c:\Users\Alex\go\go\src\runtime>%GOPATH%\bin\benchstat old new
name              old time/op  new time/op   delta
AppendGrowByte    5.10ms ± 2%   4.94ms ± 0%    -3.19%  (p=0.016 n=5+4)
AppendGrowByte-2  7.59ms ±14%  36.40ms ±12%  +379.45%  (p=0.008 n=5+5)
AppendGrowByte-3  7.10ms ± 4%  39.54ms ± 7%  +457.18%  (p=0.008 n=5+5)

Here https://gist.github.com/alexbrainman/badfbe3b1e20d3f2d1b2 are my old and new files (for reference). I use Windows 7 with Intel Core2 Duo CPU E8400 (pretty average pc these days).

I suspect it has something to do with GC. But I don't even know where to start. @aclements can you suggest what to try? Thank you.

Alex

Some day I might download the code and run the tests. but it might be quite a while before I try that.

One possibility would be to record ETW traces (using https://github.com/google/UIforETW/releases). Ideally there would be some markers (using ETWMark from UIforETW's etwprof.h) inserted to indicate the start and stop of each benchmark, but just calling Sleep(150) between tests works pretty well. I could probably figure out from the traces what is going on, or at least narrow it down a bit.

The benchmarks with the lower timer precision are mostly not slower they are just less accurate.

@randomascii, what do you mean by this? Each benchmark runs in a loop for at least a second and we record the time at the beginning and end. From what I understand, you're saying that the time measurement can be off by 15ms (so the delta could be off by 30ms?), but that's only 3% on the scale of a second.

BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.

Both the garbage collector and the scheduler do contain wait loops (e.g.,

if i < 10 {
) that after a few spins will perform several OS yields and eventually fall back to short sleeps (e.g., 100us). The fact that this hits AppendGrowByte only when GOMAXPROCS>1 is telling, since with GOMAXPROCS=1 we should never hit these wait loops. If I'm understanding correctly, these fall backs are much more expensive on Windows than they would seem. Is this right?

If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks. Something like this at the end of func gc in runtime/mgc.go should do it:

for mp := allm; mp != nil; mp = mp.alllink {
    println(mp.gcstats.nprocyield, mp.gcstats.nosyield, mp.gcstats.nsleep)
}

I'm hoping to eliminate these wait loops (#12041). If this is tied in, that's more impetus to do so.

As to your second hypothesis, I don't see timeGetTime anywhere in the Go codebase, so I'm not sure what exactly you're referring to. In the runtime, it looks like we get the time on Windows by reading from some magic address:

func systime(addr uintptr) int64 {

I thought that the benchmark might be stopping/starting the timers multiple times within the benchmark. If not then the maximum error should, as you observed, be quite small.

fall back to short sleeps (e.g., 100us)

Unfortunately I am not aware of a way to sleep for 100us on Windows. Sleep(0) (or WaitForSingleObject with a zero timeout) returns immediately. Sleep(1) returns in somewhere between 1 ms and 15.625 ms, depending on OS version and timer frequency and where you are relative to the next interrupt. Specifically, it will return on the next interrupt or perhaps the one after that. It's a mess and yes, it can be a lot more expensive than it would appear. And I don't know of a way around it other than don't call Sleep(1) if you care about running fast, or Sleep(0) ever.

I suspect that reading from that magic address is equivalent to timeGetTime() - it reads from a counter that is updated by the timer interrupt.

In short, it sounds like #12041 will solve the problem. An ETW trace records every context switch of every thread in every process so it is easy to see if a CPU core is running continuously or constantly sleeping. It sounds like the GC is calling Sleep(1) frequently in both cases, and those are more expensive with a lower timer frequency. Avoiding that will cause the benchmark to run faster in both cases.

Okay. It sounds like we should revisit this once I figure out how to eliminate that wait loop. :) I think it would still be worth checking the nsleep counter to lend hard evidence to this hypothesis (@alexbrainman, would that be convenient for you to do?)

Unfortunately I am not aware of a way to sleep for 100us on Windows.

It looks like we use NtWaitForSingleObject. Remarkably, as far as I can tell, the Windows documentation doesn't specify the units for the timeout argument, but the runtime code claims it takes multiples of 100ns.

@alexbrainman, would that be convenient for you to do?

Alternatively, could you collect an ETW trace of that benchmark and dump it here?

The units probably are multiples of 100 ns. However the wakeups (unless the event you wait on is signaled) are done by the scheduler which runs on the timer interrupt. So, if you specify 100 us then the function either has to return immediately (round down) or on the next timer interrupt (round up). That's the fundamental problem. If you lower the timer interval to 0.5 ms then you still get longer sleeps than desired, and wasted electricity.

Yes, Windows needs nanosleep. But it don't got it.

@alexbrainman if you record an ETW trace (see UIforETW link above) I'll analyze it.

If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks.

See my output:

c:\Users\Alex\go\go\src\runtime>git diff
diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go
index 848e8f6..eef12dd 100644
--- a/src/runtime/mgc.go
+++ b/src/runtime/mgc.go
@@ -1210,6 +1210,10 @@ func gc(mode int) {
        // give the queued finalizers, if any, a chance to run
        Gosched()
    }
+
+   for mp := allm; mp != nil; mp = mp.alllink {
+       println(mp.gcstats.nprocyield, mp.gcstats.nosyield, mp.gcstats.nsleep)
+   }
 }

 // gcBgMarkStartWorkers prepares background mark worker goroutines.
diff --git a/src/runtime/os1_windows.go b/src/runtime/os1_windows.go
index f608b4a..be5a4b2 100644
--- a/src/runtime/os1_windows.go
+++ b/src/runtime/os1_windows.go
@@ -154,7 +154,7 @@ func osinit() {

    stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1)

-   stdcall1(_timeBeginPeriod, 1)
+   //stdcall1(_timeBeginPeriod, 1)

    ncpu = getproccount()

c:\Users\Alex\go\go\src\runtime>go test -run=none -bench=AppendGrowByte
PASS
BenchmarkAppendGrowByte-2   0 0 0
8 0 0
0 0 0
0 0 0
10 1 0
0 0 0
8 0 0
0 0 0
0 0 0
20 2 0
0 0 0
30 1 0
0 0 0
0 0 0
20 2 0
10 1 0
30 1 0
0 0 0
0 0 0
20 2 0
20 2 0
30 1 0
0 0 0
0 0 0
20 2 0
20 2 0
40 2 0
0 0 0
0 0 0
20 2 0
20 2 0
40 2 0
0 0 0
0 0 0
30 3 0
20 2 0
50 3 0
0 0 0
0 0 0
30 3 0
20 2 0
50 3 0
0 0 0
0 0 0
40 4 0
20 2 0
60 4 0
0 0 0
0 0 0
40 4 0
30 3 0
60 4 0
0 0 0
0 0 0
40 4 0
30 3 0
60 4 0
0 0 0
0 0 0
50 5 0
30 3 0
70 5 0
0 0 0
0 0 0
50 5 0
30 3 0
80 6 0
0 0 0
0 0 0
50 5 0
30 3 0
90 7 0
0 0 0
0 0 0
50 5 0
30 3 0
100 8 0
0 0 0
0 0 0
50 5 0
40 4 0
100 8 0
0 0 0
0 0 0
50 5 0
40 4 0
100 8 0
0 0 0
0 0 0
60 6 0
40 4 0
100 8 0
0 0 0
0 0 0
70 7 0
40 4 0
110 9 0
0 0 0
0 0 0
70 7 0
50 5 0
110 9 0
0 0 0
0 0 0
70 7 0
50 5 0
110 9 0
0 0 0
0 0 0
80 8 0
50 5 0
120 10 0
0 0 0
0 0 0
80 8 0
50 5 0
120 10 0
0 0 0
0 0 0
90 9 0
50 5 0
130 11 0
0 0 0
0 0 0
90 9 0
60 6 0
130 11 0
0 0 0
0 0 0
90 9 0
60 6 0
140 12 0
0 0 0
0 0 0
90 9 0
60 6 0
140 12 0
0 0 0
0 0 0
100 10 0
70 7 0
140 12 0
0 0 0
0 0 0
100 10 0
70 7 0
150 14 0
0 0 0
0 0 0
100 10 0
70 7 0
150 14 0
0 0 0
0 0 0
110 11 0
70 7 0
160 15 0
0 0 0
0 0 0
110 11 0
80 8 0
160 15 0
0 0 0
0 0 0
110 11 0
80 8 0
160 15 0
0 0 0
0 0 0
120 12 0
80 8 0
170 16 0
0 0 0
0 0 0
120 12 0
80 8 0
180 17 0
0 0 0
0 0 0
120 12 0
80 8 0
180 17 0
0 0 0
0 0 0
130 13 0
80 8 0
190 18 0
0 0 0
0 0 0
130 13 0
80 8 0
200 19 0
0 0 0
0 0 0
130 13 0
80 8 0
200 19 0
0 0 0
0 0 0
140 14 0
90 9 0
200 19 0
0 0 0
0 0 0
140 14 0
90 9 0
210 20 0
0 0 0
0 0 0
140 14 0
90 9 0
220 21 0
0 0 0
0 0 0
140 14 0
90 9 0
220 21 0
0 0 0
0 0 0
150 15 0
100 10 0
220 21 0
0 0 0
0 0 0
150 15 0
100 10 0
230 22 0
0 0 0
0 0 0
150 15 0
100 10 0
230 22 0
0 0 0
0 0 0
160 16 0
100 10 0
240 23 0
0 0 0
0 0 0
160 16 0
110 11 0
240 23 0
0 0 0
0 0 0
160 16 0
110 11 0
250 24 0
0 0 0
0 0 0
160 16 0
110 11 0
260 25 0
0 0 0
0 0 0
160 16 0
120 12 0
260 25 0
0 0 0
0 0 0
160 16 0
120 12 0
260 25 0
0 0 0
0 0 0
170 17 0
130 13 0
260 25 0
0 0 0
0 0 0
170 17 0
130 13 0
270 26 0
0 0 0
0 0 0
170 17 0
130 13 0
270 26 0
0 0 0
0 0 0
180 18 0
130 13 0
270 26 0
0 0 0
0 0 0
190 19 0
140 14 0
270 26 0
0 0 0
0 0 0
190 19 0
150 15 0
270 26 0
0 0 0
0 0 0
190 19 0
150 15 0
270 26 0
0 0 0
0 0 0
200 20 0
150 15 0
280 27 0
0 0 0
0 0 0
200 20 0
160 16 0
280 27 0
0 0 0
0 0 0
200 20 0
160 16 0
290 28 0
0 0 0
0 0 0
200 20 0
160 16 0
290 28 0
0 0 0
0 0 0
210 21 0
160 16 0
300 29 0
0 0 0
0 0 0
210 21 0
170 17 0
300 29 0
0 0 0
0 0 0
210 21 0
170 17 0
300 29 0
0 0 0
0 0 0
220 22 0
180 18 0
300 29 0
0 0 0
0 0 0
220 22 0
180 18 0
300 29 0
0 0 0
0 0 0
230 24 0
190 19 0
300 29 0
0 0 0
0 0 0
230 24 0
190 19 0
310 30 0
0 0 0
0 0 0
230 24 0
200 20 0
310 30 0
0 0 0
0 0 0
230 24 0
200 20 0
310 30 0
0 0 0
0 0 0
240 25 0
200 20 0
320 31 0
0 0 0
0 0 0
240 25 0
210 21 0
320 31 0
0 0 0
0 0 0
240 25 0
210 21 0
330 32 0
0 0 0
0 0 0
240 25 0
220 22 0
330 32 0
0 0 0
0 0 0
240 25 0
220 22 0
340 33 0
0 0 0
0 0 0
240 25 0
220 22 0
340 33 0
0 0 0
0 0 0
250 26 0
220 22 0
350 34 0
0 0 0
0 0 0
250 26 0
230 23 0
350 34 0
0 0 0
0 0 0
250 26 0
230 23 0
360 35 0
0 0 0
0 0 0
250 26 0
230 23 0
360 35 0
0 0 0
0 0 0
260 27 0
230 23 0
360 35 0
0 0 0
0 0 0
270 28 0
230 23 0
370 36 0
0 0 0
0 0 0
270 28 0
230 23 0
370 36 0
0 0 0
0 0 0
280 29 0
240 24 0
370 36 0
0 0 0
0 0 0
280 29 0
240 24 0
380 37 0
0 0 0
0 0 0
280 29 0
240 24 0
390 38 0
0 0 0
0 0 0
280 29 0
250 25 0
390 38 0
0 0 0
0 0 0
280 29 0
260 26 0
390 38 0
0 0 0
0 0 0
280 29 0
270 27 0
390 38 0
0 0 0
0 0 0
280 29 0
270 27 0
400 39 0
0 0 0
0 0 0
280 29 0
280 28 0
400 39 0
0 0 0
0 0 0
280 29 0
280 28 0
410 40 0
0 0 0
0 0 0
280 29 0
290 29 0
410 40 0
0 0 0
0 0 0
280 29 0
290 29 0
410 40 0
0 0 0
0 0 0
290 30 0
290 29 0
420 41 0
0 0 0
0 0 0
290 30 0
300 30 0
420 41 0
0 0 0
0 0 0
290 30 0
300 30 0
430 42 0
0 0 0
0 0 0
290 30 0
300 30 0
430 42 0
0 0 0
0 0 0
300 31 0
300 30 0
440 43 0
0 0 0
0 0 0
300 31 0
310 31 0
440 43 0
0 0 0
0 0 0
300 31 0
310 31 0
440 43 0
0 0 0
0 0 0
310 32 0
310 31 0
440 43 0
0 0 0
0 0 0
320 33 0
310 31 0
450 44 0
0 0 0
0 0 0
320 33 0
320 32 0
450 44 0
0 0 0
0 0 0
320 33 0
320 32 0
460 45 0
0 0 0
0 0 0
320 33 0
330 33 0
460 45 0
0 0 0
0 0 0
320 33 0
330 33 0
460 45 0
0 0 0
0 0 0
330 34 0
330 33 0
460 45 0
0 0 0
0 0 0
340 35 0
330 33 0
470 46 0
0 0 0
0 0 0
340 35 0
340 34 0
470 46 0
0 0 0
0 0 0
340 35 0
340 34 0
470 46 0
0 0 0
0 0 0
350 36 0
340 34 0
480 47 0
0 0 0
0 0 0
350 36 0
350 35 0
480 47 0
0 0 0
0 0 0
350 36 0
350 35 0
480 47 0
0 0 0
0 0 0
360 37 0
350 35 0
490 48 0
0 0 0
0 0 0
360 37 0
360 36 0
490 48 0
0 0 0
0 0 0
360 37 0
360 36 0
490 48 0
0 0 0
0 0 0
370 38 0
360 36 0
500 49 0
0 0 0
0 0 0
370 38 0
360 36 0
500 49 0
0 0 0
0 0 0
380 39 0
360 36 0
500 49 0
0 0 0
0 0 0
390 40 0
360 36 0
500 49 0
0 0 0
0 0 0
400 41 0
370 37 0
500 49 0
0 0 0
0 0 0
400 41 0
370 37 0
510 50 0
0 0 0
0 0 0
400 41 0
370 37 0
510 50 0
0 0 0
0 0 0
410 42 0
370 37 0
520 51 0
0 0 0
0 0 0
410 42 0
370 37 0
520 51 0
0 0 0
0 0 0
420 43 0
380 38 0
520 51 0
0 0 0
0 0 0
420 43 0
380 38 0
520 51 0
0 0 0
0 0 0
430 44 0
390 39 0
520 51 0
0 0 0
0 0 0
430 44 0
390 39 0
520 51 0
0 0 0
0 0 0
440 45 0
390 39 0
520 51 0
0 0 0
0 0 0
450 46 0
400 40 0
520 51 0
0 0 0
0 0 0
450 46 0
400 40 0
520 51 0
0 0 0
0 0 0
460 48 0
400 40 0
530 52 0
0 0 0
0 0 0
460 48 0
400 40 0
540 53 0
0 0 0
0 0 0
460 48 0
410 41 0
540 53 0
0 0 0
0 0 0
460 48 0
410 41 0
550 54 0
0 0 0
0 0 0
460 48 0
410 41 0
550 54 0
0 0 0
0 0 0
470 49 0
420 42 0
550 54 0
0 0 0
0 0 0
470 49 0
430 43 0
550 54 0
0 0 0
0 0 0
470 49 0
440 44 0
550 54 0
0 0 0
0 0 0
470 49 0
440 44 0
560 55 0
0 0 0
0 0 0
470 49 0
440 44 0
560 55 0
0 0 0
0 0 0
480 50 0
450 45 0
560 55 0
0 0 0
0 0 0
480 50 0
460 46 0
560 55 0
0 0 0
0 0 0
480 50 0
460 46 0
570 56 0
0 0 0
0 0 0
480 50 0
460 46 0
570 56 0
0 0 0
0 0 0
490 51 0
460 46 0
570 56 0
0 0 0
0 0 0
500 52 0
460 46 0
580 57 0
0 0 0
0 0 0
500 52 0
470 47 0
580 57 0
0 0 0
0 0 0
500 52 0
480 48 0
580 57 0
0 0 0
0 0 0
500 52 0
480 48 0
580 57 0
0 0 0
0 0 0
510 53 0
480 48 0
590 58 0
0 0 0
0 0 0
510 53 0
490 49 0
590 58 0
0 0 0
0 0 0
510 53 0
490 49 0
590 58 0
0 0 0
0 0 0
520 54 0
500 50 0
590 58 0
0 0 0
0 0 0
520 54 0
500 50 0
600 59 0
0 0 0
0 0 0
520 54 0
500 50 0
600 59 0
0 0 0
0 0 0
530 55 0
510 51 0
600 59 0
0 0 0
0 0 0
530 55 0
510 51 0
600 59 0
0 0 0
0 0 0
540 56 0
510 51 0
610 60 0
0 0 0
0 0 0
540 56 0
510 51 0
610 60 0
0 0 0
0 0 0
550 57 0
510 51 0
620 61 0
0 0 0
0 0 0
550 57 0
520 52 0
620 61 0
0 0 0
0 0 0
550 57 0
520 52 0
620 61 0
0 0 0
0 0 0
560 58 0
520 52 0
630 62 0
0 0 0
0 0 0
560 58 0
520 52 0
630 62 0
0 0 0
0 0 0
570 59 0
530 53 0
630 62 0
0 0 0
0 0 0
570 59 0
540 54 0
630 62 0
0 0 0
0 0 0
570 59 0
540 54 0
630 62 0
0 0 0
0 0 0
580 60 0
540 54 0
640 63 0
0 0 0
0 0 0
580 60 0
550 55 0
640 63 0
0 0 0
0 0 0
580 60 0
550 55 0
650 64 0
0 0 0
0 0 0
580 60 0
550 55 0
650 64 0
0 0 0
0 0 0
590 61 0
550 55 0
660 65 0
0 0 0
0 0 0
590 61 0
550 55 0
660 65 0
0 0 0
0 0 0
600 62 0
560 56 0
660 65 0
0 0 0
0 0 0
600 62 0
560 56 0
670 66 0
0 0 0
0 0 0
600 62 0
560 56 0
670 66 0
0 0 0
0 0 0
610 63 0
560 56 0
680 67 0
0 0 0
0 0 0
610 63 0
570 57 0
680 67 0
0 0 0
0 0 0
610 63 0
580 58 0
680 67 0
0 0 0
0 0 0
610 63 0
580 58 0
690 68 0
0 0 0
0 0 0
610 63 0
580 58 0
690 68 0
0 0 0
0 0 0
620 64 0
590 59 0
690 68 0
0 0 0
0 0 0
620 64 0
590 59 0
700 69 0
0 0 0
0 0 0
620 64 0
590 59 0
700 69 0
0 0 0
0 0 0
630 65 0
590 59 0
710 70 0
0 0 0
0 0 0
630 65 0
590 59 0
710 70 0
0 0 0
0 0 0
640 66 0
600 60 0
710 70 0
0 0 0
0 0 0
640 66 0
600 60 0
710 70 0
0 0 0
0 0 0
650 67 0
600 60 0
720 71 0
0 0 0
0 0 0
650 67 0
600 60 0
730 72 0
0 0 0
0 0 0
650 67 0
600 60 0
730 72 0
0 0 0
0 0 0
660 68 0
600 60 0
730 72 0
0 0 0
0 0 0
670 69 0
600 60 0
740 73 0
0 0 0
0 0 0
670 69 0
610 61 0
740 73 0
0 0 0
0 0 0
670 69 0
610 61 0
740 73 0
0 0 0
0 0 0
680 70 0
610 61 0
750 74 0
0 0 0
0 0 0
680 70 0
620 62 0
750 74 0
0 0 0
0 0 0
680 70 0
620 62 0
750 74 0
0 0 0
0 0 0
690 71 0
620 62 0
760 75 0
0 0 0
0 0 0
690 71 0
620 62 0
760 75 0
0 0 0
0 0 0
700 72 0
620 62 0
760 75 0
0 0 0
0 0 0
710 73 0
620 62 0
760 75 0
0 0 0
0 0 0
720 74 0
630 63 0
760 75 0
0 0 0
0 0 0
720 74 0
630 63 0
770 76 0
0 0 0
0 0 0
720 74 0
630 63 0
770 76 0
0 0 0
0 0 0
730 75 0
630 63 0
770 76 0
0 0 0
0 0 0
740 76 0
640 64 0
770 76 0
0 0 0
0 0 0
740 76 0
640 64 0
780 77 0
0 0 0
0 0 0
740 76 0
     100      20334220 ns/op
ok      runtime 3.765s

c:\Users\Alex\go\go\src\runtime>

I think it would still be worth checking the nsleep counter to lend hard evidence to this hypothesis (@alexbrainman, would that be convenient for you to do?)

I am not sure what you want.

It looks like we use NtWaitForSingleObject. Remarkably, as far as I can tell, the Windows documentation doesn't specify the units for the timeout argument, but the runtime code claims it takes multiples of 100ns.

I don't believe there is a way to wait for 100ns. I can try NtWaitForSingleObject, but I think we've tried that. I think you have a choice of 15ms (default) or 1ms (supported on most modern Windows versions, but you would have to call timeBeginPeriod WinAPI to indicate that you want 1ms precision). Go runtime calls timeBeginPeriod to have 1ms waits, but that drains people's laptop batteries faster. And timeBeginPeriod is a global resource - one single program calls it and whole computer gets it. And you need to be admin (or similar) for timeBeginPeriod to succeed - so it is a hit and miss affair - some Go programs runs with timeBeginPeriod on and some with off. So it would be nice to stop using this facility altogether. If we can make Go runtime perform adequately regardless of this setting.

Alternatively, could you collect an ETW trace of that benchmark and dump it here?

I am not sure what to download and run there. And I don't like running things I download from Internet on my computer if I can avoid it. I hope my output above gives you what you need.

@alexbrainman if you record an ETW trace (see UIforETW link above) I'll analyze it.

Will do if @aclements wants me to do that.

Thank you both for your input.

Alex

CL https://golang.org/cl/16297 mentions this issue.

At this point most, though not all, sleep loops have been eliminated on master. @alexbrainman or @defia, is this still an issue? What's the effect of returning to the default timer resolution with these fixes in?

@aclements calling timeBeginPeriod really makes no difference now on my computer. Thank you very much! I compared benchmark changes in runtime, time and net, and I see no changes after I remove call to timeBeginPeriod. What other benchmarks should I try?

Alex

Great! There are two other benchmarks I'd run before considering this closed: The go1 benchmarks in $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (not the standard framework; run bench with -bench garbage and look for the GOPERF-METRIC:time line, which is the benchmark duration in ns.)

@aclements, I have checked both $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (in addition to all benchmarks in runtime, net and time). I also checked all these on few different computers. I don't see any difference if I call timeBeginPeriod or not. So, I propose, we remove the call: https://go-review.googlesource.com/17164

Alex

CL https://golang.org/cl/17164 mentions this issue.

That's great news.

Eliminating these sleep loops should also have improved the benchmark times has that been seen? The effect may be small - I would guess it would be ~1/16th of the difference that used to be seen when enabling/disabling timeBeginPeriod.

Eliminating these sleep loops should also have improved the benchmark times has that been seen?

I don't see any difference in benchmarks after I apply https://go-review.googlesource.com/17164 I don't know enough about scheduler to comment on your claim about "improved benchmarks". Maybe Austin will comment, but I am happy to proceed with https://go-review.googlesource.com/17164 as is.

Thank you for pushing this along.

Alex

Unfortunately we discovered some cases where not calling timeBeginPeriod makes Go scheduler considerably slower. So https://go-review.googlesource.com/#/c/20834/ restored original behavior - Go runtime is calling timeBeginPeriod again. Therefore I am reopening this issue.

Current plan is to resolve #7876 to get rid of calling timeBeginPeriod again. But nobody is working on #7876 at this moment.

Alex

Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases? I develop an app that can run on laptops and having the app use less power is more important than performance.

Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases?

As far as I know, bad performance is the only downside.

Alex

To be clear, in most cases any slowdowns caused by the timer frequency being lower are actually bugs.

With the exception of games or perhaps video-playback there should be no need to rely on timeouts and Sleep(n), which are the things affected. Any software that just does its work as quickly as possible and then exits will run either at exactly the same speed or slightly faster when the timer frequency is lower.

The main uses of Go that I see are long-running background apps. These apps consume about 0.01% CPU or less. However, because they are written in Go and because the Go runtime raises the timer interrupt frequency whether it helps or hurts this means that these idle apps waste more power than many more active apps.

Any progress on either fixing the scheduler so that it doesn't require the higher timer interrupt frequency, or giving Go applications a way to opt out?

Until then the recommendation I am giving is that Go should not be used for long-running apps that may be sitting idle in the background.

Any progress ... ?

I am not working on this. As far as I know what I said here #8687 (comment) is still true. Maybe @dvyukov can add more. Sorry.

Alex

defia commented

@randomascii you can call timeEndPeriod to revert changes by go runtime
https://github.com/defia/trf

I have nothing to add.

Dunno how expensive it is, but sysmon already tracks the necessary condition, so the patch could be just:

--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3710,6 +3710,7 @@ func sysmon() {
        lastscavenge := nanotime()
        nscavenge := 0
 
+       highRes := false
        lasttrace := int64(0)
        idle := 0 // how many cycles in succession we had not wokeup somebody
        delay := uint32(0)
@@ -3722,6 +3723,13 @@ func sysmon() {
                if delay > 10*1000 { // up to 10ms
                        delay = 10 * 1000
                }
+               if !highRes && delay < 10*1000 {
+                       timeBeginPeriod()
+                       highRes = true
+               } else if highRes && delay >= 10*1000 {
+                       timeEndPeriod()
+                       highRes = false
+               }
                usleep(delay)
                if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
                        lock(&sched.lock)

As far as I understand, kernel has a global counter of timeBeginPeriod across all processes, so if machine runs a bunch of Go processes and one of them does frequent timeBeginPeriod/timeEndPeriod, then it will be just atomic inc/dec of the counter since it never drops to zero.

@dvyukov, don't we want to timeEndPeriod()/timeBeginPeriod() around the notetsleep(&sched.sysmonnote, maxsleep)?

Calling timeBeginPeriod/timeEndPeriod is quite cheap. There is a brief delay before the changes take effect (probably the next timer interrupt, so ~16 ms or so) but the actual overhead of the calls is effectively zero. Chrome does this so that it can do precise timing when a web page is doing animation or otherwise requesting lots of short timeouts, while being power efficient when the loaded pages are closer to idle.

I agree that if the Go program is consuming CPU anyway then the consequences of having the timer raised are quite small. The power consequences would be essentially zero. That would be a greatly appreciated fix.

How expensive is it to call timeBeginPeriod/timeEndPeriod?

I changed:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 44c982b..249d5b4 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -50,6 +50,7 @@ const (
 //go:cgo_import_dynamic runtime._WriteConsoleW WriteConsoleW%5 "kernel32.dll"
 //go:cgo_import_dynamic runtime._WriteFile WriteFile%5 "kernel32.dll"
 //go:cgo_import_dynamic runtime._timeBeginPeriod timeBeginPeriod%1 "winmm.dll"
+//go:cgo_import_dynamic runtime._timeEndPeriod timeBeginPeriod%1 "winmm.dll"
 
 type stdFunction unsafe.Pointer
 
@@ -93,6 +94,7 @@ var (
 	_WriteConsoleW,
 	_WriteFile,
 	_timeBeginPeriod,
+	_timeEndPeriod,
 	_ stdFunction
 
 	// Following syscalls are only available on some Windows PCs.
@@ -260,6 +262,14 @@ var useLoadLibraryEx bool
 
 var timeBeginPeriodRetValue uint32
 
+func TimeBeginPeriod() uint32 {
+	return uint32(stdcall1(_timeBeginPeriod, 1))
+}
+
+func TimeEndPeriod() uint32 {
+	return uint32(stdcall1(_timeEndPeriod, 1))
+}
+
 func osinit() {
 	asmstdcallAddr = unsafe.Pointer(funcPC(asmstdcall))
 	usleep2Addr = unsafe.Pointer(funcPC(usleep2))
diff --git a/src/runtime/syscall_windows_test.go b/src/runtime/syscall_windows_test.go
index 11e67df..6da6325 100644
--- a/src/runtime/syscall_windows_test.go
+++ b/src/runtime/syscall_windows_test.go
@@ -1066,3 +1066,10 @@ package main
 func main() {
 }
 `
+
+func BenchmarkTimeBeginEndPeriod(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		runtime.TimeEndPeriod()
+		runtime.TimeBeginPeriod()
+	}
+}

And I see (on my windows/386 Windows XP):

C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod-2           10000000               144 ns/op
PASS

and (on my windows/amd64 Windows 7):

C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod     20000000               104 ns/op
PASS

Alex

the patch could be just:

I am not criticizing your patch. Just a note from https://msdn.microsoft.com/en-us/library/windows/desktop/dd757626(v=vs.85).aspx:

" ... You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls. An application can make multiple timeBeginPeriod calls as long as each call is matched with a call to timeEndPeriod. ..."

Alex

Could someone give https://go-review.googlesource.com/38403 a try? It implements the idea of turning down the timer resolution when the Go process is idle.

CL https://golang.org/cl/38403 mentions this issue.

Can the semasleep() be changed to adaptive to duration? If it is larger than the normal timer resolution, call WaitForSingleObject with the duration, otherwise call WaitForSingleObject with zero timeout (the waited event may also need to be changed to manual-reset event), and yield the thread and check for current time in a spin loop like what nanosleep() in linux does. This way we can have high resolution timer too.

To get high resolution time stamps in windows, QueryPerformanceCounter can be used, for newer version of windows there is GetSystemTimePreciseAsFileTime, it is like clock_gettime() in linux. Currently the Interrupt Time seems to be used in Go under windows, it is in assembly after this change e4371fb, so may be it can be changed to use high resolution time stamp like Go under linux does. Then time.Now() will also has high resolution under windows too. The other way is using rdtsc directly and synchronize it across cores.

Spinning for up to 15.6 ms (the default timer interval is 15.625 ms) seems like a bad idea. Most applications should not have a need to wake up on a ms or sub-ms precision timer. The few exceptions I can think of are VR and maybe some industrial control applications. Other applications should either be running fool speed (each task triggering the next) or being triggered by events such as disk I/O completions, network read completions, v-blank, etc.

@noblehng unfortunately using GetSystemTimePreciseAsFileTime directly has a high performance penalty compared to the current hack of reading the time directly out of memory.

We needed this additional precision in CockroachDB, so we added a call to GetSystemTimePreciseAsFileTime in cockroachdb/cockroach@3fc5c81, but as the commit message says:

Note that this implementation is significantly slower than time.Now:

name   old time/op  new time/op   delta
Now-2  14.0ns ± 3%  339.4ns ± 4%  +2332.32%  (p=0.000 n=9+9)

I suspect this is because of the explicit syscall required. time.Now
avoids the syscall by reading the system clock directly[1][2], so if
we need better performance, we can do something similar.

I believe the implementation of GetSystemTimePreciseAsFileTime is similar to that of KeQuerySystemTimePrecise, which is (maybe) described in enough detail to permit an implementation in Go that avoids a syscall.

@randomascii As most applications will not need high resolution timer, they will not trigger the spinning code path. And for those applications that do need a high resolution timer, they can be supported under the same interface. And if we are spinning around a yield to the kernel scheduler, we are not burning cpu that much. I'm not sure the overhead of calling yield in Go in a loop (it will be better if WaitForSingleObject support high resolution and do this inside the kernel directly), but I assume it will be good enough for the current Go scheduler, it seems to need a resolution of tens to hundreds of microseconds. The Go scheduler can still use a hybrid approach that relax the resolution when idle.

@tamird If syscall is the bottleneck, it seems that it can also be derived from the _KUSER_SHARED_DATA struct, virtualbox dose this. One can find the struct definition in ntddk.h, or the offsets of each Qpc* corresponding fields of each version of windows can be found here. And RtlQueryPerformanceCounter is basically doing a (rdtsc + QpcBias) >> QpcShift with a fallback of a syscall back to kernel, there is a disassembly analysis of the win7 version here. There is also the overhead of some hypervisors trapping rdtsc.

If the spinning only happens for applications that need high resolution timers then that seems reasonable. How would applications indicate that they need a high resolution timer? Right now the Go runtime assumes that all applications need high resolution timers (hence the decision to raise the system-global timer interrupt frequency) and I'd hate to see that repeated with spinning.

@randomascii A Go user program using timers will use a Go timer (or maybe runtime.SetCPUProfileRate too), which is tickless and doesn't directly map to OS timer, a handler routine will sleep till next timer expiry or new timer be added, that sleep is using semasleep() so WaitForSingleObject under windows. So an application can indicate whether they need a high resolution timer by using different duration when construct the Go timer, which will translate to duration passing to semasleep(), then we can chose to do spinning or not dynamically based on the duration. It seems that dotnet use a similar approach here, it use a set of thresholds to do spinning, yield(SwitchToThread and Sleep(0)) and wait with the regular timer dynamically, those thresholds may also be used in Go. The benefit here is that we don't need to touch the systemwide interrupt tick timer and we will only spin several times when needed.

The Go runtime problem here is a different one. As far as I can see, the main problem currently is that the Go scheduler itself need a somewhat 'high' resolution timer in the monitor thread to retake processor(s) blocking on syscall (and probably cgocall too) to run other user codes, as #14790 and here. Other things the monitor thread does are already working with a low resolution tick of at least 10ms, and syscalls inside the runtime are already working with completions or annotated. Problem is user code can still call arbitrary blocking syscall. The monitor thread try to dectect that by using a period between 20µs to 10ms, which is 1ms to 10ms under windows currently. I suspect that small sleep there can be replace with a few yield to the OS using some thresholds similar to above, and sleep larger than 1ms can just be 15.6ms under windows. Because small sleeps will only happen when not idle, so those few spinning with yields will only happen when not idle. Further improves are also possible, like tracking the number of processors in syscall then only do spinning when that become positive, or we can make the monitor thread sleep even longer by only wake it up when there are runerable goroutines or gc jobs or io waiters.

Edit:
Now I think it is a bad idea to do spinning in semasleep(), as Go doesn't coalesce timers, so two timer could fire very closely, then we will spin too often. It is better leave the choice to user to change the systemwide interrupt timer or do spinning if they need sub-millisecond resolution under windows. Linux achieves this by dynamic reprogramming the clock event device for the next firing timer, that is local APIC timer on x86.

I'm still looking for feedback on whether https://go-review.googlesource.com/c/38403/ is an acceptable fix for Go always running the timer at high resolution. The 1.9 freeze is May 1st, so if it's going to go in, it should go in before that.

@aclements has any consideration been given to using https://godoc.org/golang.org/x/sys/windows#GetSystemTimePreciseAsFileTime when available? I believe that wouldn't require changing the timer resolution, but perhaps I'm missing something.

That function is available starting with Windows 8.

@aclements has any consideration been given to using https://godoc.org/golang.org/x/sys/windows#GetSystemTimePreciseAsFileTime when available?

My understanding (please correct me if I'm wrong; I don't have a good understanding of the Windows APIs) is that GetSystemTimePreciseAsFileTime is relatively expensive. Also, the problem that requires high timer resolution (at least in the runtime) isn't getting the precise time stamps, it's sleeping for short durations, which is currently done using either NtWaitForSingleObject (runtime.usleep2) or WaitForSingleObject (runtime.semasleep). If I understand correctly, these are limited by the system timer resolution.

@aclements your understanding is correct. Sleep(), NtWaitForSingleObject(), etc. will return either when the condition is satisfied (handle signaled) or when the timeout expires. However timeout expiry can only happen when a timer interrupt happens so their resolution is affected by the timer interrupt frequency.

If the problem was a need to get precise times then it is unlikely that GetSystemTimePreciseAsFileTime would help. Most functions in Windows that get the time ultimately rely on timeGetTime() or GetTickCount() and those both ultimately read from a counter that is incremented by (you guessed it) the timer interrupt. GetTickCount() is designed (for compatibility reasons) to only change its results ~64 times a second, regardless of the timer interrupt frequency, whereas timeGetTime() reads a timer value (basically an OS global variable) that is incremented by the timer interrupt and therefore has precision somewhere between 15.6 ms and 1.0 ms. I documented some of this behavior here:

https://randomascii.wordpress.com/2013/05/09/timegettime-versus-gettickcount/

In short, hunting for different functions to get the time is unlikely to help because they are mostly all driven by the same underlying timer interrupt.

If you want high-precision timer information that is not dependent on the timer interrupt then you probably want QueryPerformanceCounter(), which typically has < 0.1 microsecond precision. However this is generally not as long-term stable as the timer interrupt - precision and accuracy are not the same thing - so be careful when using it.

Time is hard.

Linking in a relevant comment here:
https://go-review.googlesource.com/c/38403#message-96d2d5bb23216c271f8f8c7bd7d471e0ce999fae

Windows does use separates linked lists internally for it's active threads, making the scheduler quite efficient - but it does take a toll, when it's running with higher resolution for the kernel's book keeping. It's evident from Chrome, which does the same trick, while Edge does a 4ms resolution only when needed, making it over 30-40% more energy efficient as per Microsoft's claims. On the other hand Chrome simply raises the resolution to 1ms every time - even if you just move the mouse around.

I think the goal should be to remove the reliance on multimedia timers entirely. While this likely may not be possible until UMS itself is implemented, it can be made consistent with other Windows programs on how it uses time.

The multimedia timers are for programs to decide on the resolution so they can choose effectively, not for the language runtime itself to make a decision. In the Windows world - where you don't have a very reliable nanosleep - it's generally accepted that time resolution is 15ms (unless you're doing realtime/multimedia application, in which case the program explicitly asks for it). So, it's not generally a surprise to the programmers that this is as low as you go, and it's consistent with the behavior of the platform.

And it's really not hard for programs that want to support windows to manually do the resolution switch if at all they need. So, I think Go tried too hard, and is now breaking things by forcing every Go program to not be a good citizen on the platform.

Any progress on this? The previous comment did a good job of explaining why the Go runtime should not raise the timer frequency by default. It needs to be an opt-in option.

I'm using Windows 10, how can I see my System Timer interval? Is there any application for it?

The sysinternals' tool clockres is the easiest way to display the current timer interval, if you're comfortable with using the command prompt. Another alternative is this GUI tool:
https://github.com/tebjan/TimerTool

More resources are available here:
https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/

But the best option is running this batch file from an admin prompt (after installing WPT, downloading and running the latest UIforETW release will do it):
https://github.com/google/UIforETW/blob/master/bin/trace_timer_intervals.bat

This will tell you every program that has raised the timer interrupt frequency over the time period recorded. All of the other tools just give you a snapshot so they can give severely misleading results for programs (like recent Go programs, or Chrome) which frequently change the timer interrupt frequency.

If you can open an administrator command prompt then you can run this command:

powercfg /energy /duration 5

This will create a file called energy-report.html. If you load that into a web browser and search for "Outstanding Timer Request" you can see who has raised the timer interrupt frequency. That is how I initially found that the Go runtime was doing this.

I am a Windows developer working at Google. I checked recently and I found ten (10!) different programs on my work machine that were all raising the Windows system-global timer interrupt frequency to 1 kHz. I raised this with our winops team who is responsible for creating and pushing these tools and they said that it is likely that these are all Go tools.

This means that my machine is essentially always running with a raised timer interrupt frequency. This then makes it quite difficult to do any testing of Chrome's behavior with the timer interrupt frequency not raised.

The winops team is not thrilled about having to apply mitigations to every tool which they write in order to stop the Go Runtime from doing this. In addition, all that they can do is revert the timer frequency change after it happens, which means that it still happens briefly whenever a Go programs is launched.

Can we please get the Go Runtime to stop doing this? It's a bad default. If Go ever becomes popular on Windows then it will need to be fixed to avoid globally increasing power usage and it would be better to fix it now.

Any regressions from fixing it necessarily come from inappropriately designed benchmarks which are inserting delays and hoping that they will be short.

If specific Go programs need the timer frequency raised then they should raise it themselves.

The internal Google bug number that I filed is 133354039.

This is the script that I have written in order to record these timer frequency transitions is:
https://github.com/google/UIforETW/blob/master/bin/trace_timer_intervals.bat

It was just pointed out to me that the Go Runtime was changed to only raise the timer interrupt frequency on demand.

11eaf42

This is an improvement, to be sure, but it is still not ideal. Because the Windows timer interrupt is a system global resource, any changes which Go makes to the timer interrupt frequency will affect other programs. That is, due to the many Go programs which our winops team runs on my workstation I can never test what Chrome behaves like when the timer interrupt frequency is "normal". Our winops team doesn't want or need the raised timer frequency and would appreciate a way to opt-out (or, better yet if the timer adjustments were opt-in).

Note that if the osRelax() implementation was behind a flag then this would also give an easy fix to this issue:

#28255

TL;DR - automatically changing the system-global timer interrupt frequency is not universally desirable and should be opt-in (ideally) or opt-out (next best). Either solution, but especially opt-in, would fix both this issue and issue 28255.

Thanks!

Let's revisit in the 1.15 release cycle.

We started using a higher resolution in https://golang.org/cl/4960057, which added support for profiling on Windows. Perhaps we could at least switch to the higher resolution only while profiling, not all the time.

I think we're all in agreement that the runtime shouldn't be lowering the system timer frequency. The question is what we should be doing instead.

We tried removing the timeBeginPeriod in 2016 and it caused severe regressions in some benchmarks (#14790) because it affected sysmon's ability to promptly retake Ps from system calls and C calls. We're not doing this because specific Go applications depend on the higher frequency. If that were the case, I'd be happy to give them the API to ask for it. We're doing this because the Go runtime itself depends on the higher frequency. This doesn't affect all Go applications, but it's hard to predict what it will affect; it's clearly not just the games and multimedia applications that typically depend on a higher timer frequency.

I'd be happy to told this is no longer the case, but as far as I know the solution is still to use UMS to detect blocked system calls rather than short timers (this would be great, in fact; UMS is a far better solution than short timers, just nobody has tackled that complexity yet). Until that's implemented, my understanding is that simply removing the timeBeginPeriod calls isn't viable.

Have we tried changing Windows so that entersyscall acts like entersyscallblock? It seems that then the sysmon timer frequency would be less important, though of course there would be a performance penalty for every non-blocking system call.

I've been out of the loop for a year or two, so forgive me for not being up to date on recent changes.

If specific Go programs need the timer frequency raised then they should raise it themselves.

It depends on your priorities. Some need the higher resolution a faster clock offers and aren't as concerned about the extra overhead it costs.

Some other languages do this, too. Perl comes to mind (last time I checked it was about 5 years ago). And there are some other programs which set it fast, sometimes dynamically (as Go does now), for example Chrome.

As far as I can remember: Initially Go set it fast, always. But the user could slow it down. Most were happy.

To make the others happy, Go set it slow after start up. This made them happy but it upset thye other camp. For example, this broke 2 of my programs. But the user could force it fast, so it was easy to patch.

Then the Go runtime decided it needed to be fast from time to time for internal reasons, so it dynamically changed it. But this wasn't coordinated with any user i/f, so the user lost the ability to slow it. Some from both camps ( fast and slow) were upset.

Then Austin came up with the great idea to set it dynamically, when sleep and some other time related functions were called. This make most happy. The slow camp only saw occasional unwanted speedups which automatically reverted about 50ms later. The first time called, however, there could be a long (2 x 15.6ms, if memory serves) delay. This troubled the fast group but they found ways to keep it always fast (my favorite is to make a goroutine that does a 10ms sleep in a tight loop). Almost everyone was happy.

If a stable environment for testing is what you need, then one way is to run a go "utility" which stays in fast mode (as I just described) for the duration of the test, because if ANYONE on the system asks for fast then EVERYONE is affected. On the other hand, if your program runs in a pristine environment and is certain that nobody will do this, then I'd think there is no problem at test time either, as long as you test in the same environment. :)

A work-around might be to give the user the ability to enable (default) or disable the dynamic speedup done based on time-related calls. This wouldn't block what the runtime does for itself, however.

Side note: Why do some want the slow clock? Embedded systems trying to save power and high performance systems who don't want the extra OS overhead. Am I missing any? Have you done an estimate of what percentage effect is incurred by fast vs slow? I realize that situations vary, but some idea of the average and worst case would be interesting, I think.

Another side note: An additional consideration is a problem I mentioned a few years ago. Portable code might require more resolution that Windows is capable of (1ms). The more the disparity (15.6ms) the higher the odds that something bad might happen. The 2 programs I mentioned above both exhibited this. They ran fine on a Linux system, and fine on Windows fast (1ms) - mainly because I had allowed for a clock as slow as 1ms. But they both failed on Windows sloe (15.6ms). So the Go compatibility goals are a consideration also.

Side note: The Go runtime does internal timing much faster than 1ms in about a number places. I asked then if they were safe with a slower (either 1 or 15.6ms) clock. I don't recall anyone ever looking into that.

And then there's the other issue that "always longer than requested" is incorrect. It should say "from 1 tick (whatever that is on your system) less to something longer". This is most troublesome at the 1 tick boundary, of course. For example, on a fast (1ms) Windows system, if you ask for a 1ms sleep, it might sleep anywhere from 0 to 2ms (or longer if it's a busy system). It's the 0ms side that scares me.

I second the power saving/battery life issue. We ship a Windows program built in Go that is long-running and in the background most of the time. It spends most of the time idle or only waking up and doing very small and quick processing before going idle again. Our use-case would 100% take the power saving over faster performance that a faster clock might bring.

I like the idea of an environment/runtime variable. Some (all?) of the problems between the Go runtime and the Go user could be resolved this way. It seems like the best we can hope for.

I found another data point to answer the question of why this matters. I just found that the performance measurement test machines used at Google to test Chromium have long-running Go programs on them. This means that the timer interrupt frequency is sometimes/frequently/always raised, which means that Chrome behaves slightly differently on those machines compared to normal consumer machines. This then makes it more difficult for us to validate changes that we make to Chrome to make it less dependent on a raised timer interrupt frequency, because the timer interrupt frequency will still be raised.

FWIW.

I just reviewed this issue from the top and my understanding (which hasn’t really changed since December) is that this remains blocked on implementing UMS for blocked syscall detection (#7876). Once we’ve done that, we believe the runtime itself will no longer depend on the high timer frequency, and we can drop timeBeginPeriod from the runtime (and probably expose them for applications that need them).

Unfortunately, we have very little Windows expertise on the runtime team at Google (in a somewhat ridiculous twist, our Windows laptop is in a drawer in my office, which remains inaccessible for the foreseeable future) and it’s difficult to solicit complex changes like this from open source contributors (though that would be extremely welcome!). I’m looking for solutions to this.

Ian suggested using entersyscallblock for all syscalls on Windows, though this would come at a performance penalty for all system calls. I don’t believe this has been tried.

History up to now

The call to timeBeginPeriod was added in 2011 as part of CPU profiling support. However, the runtime had several tight sleep loops and, as a result, increasing the timer frequency became important to the performance of the runtime and hence many Go programs.

In 2015, we'd eliminated nearly all of these sleep loops and it appeared the runtime no longer depended on higher timer frequency, so the frequency adjustment was removed. Unfortunately, it turned out that some benchmarks still suffered (#14790), so timeBeginPeriod was added back. @jstarks's analysis showed that this was because of the sysmon loop that detects blocked system calls. Hence, @alexbrainman suggested that the way to resolve this is to resolve #7876 by switching to user-mode scheduling to detect blocked system calls.

We made some progress on this in 2017 by reducing the timer frequency when the whole process goes idle. This improved the situation, but is far from ideal because the timer frequency is still regularly raised, causing battery drain and interfering with testing of other applications that also need to manipulate the timer frequency (e.g., Chrome).

However, the fundamental issue with detecting blocked system calls in the runtime that led to #14790 remains. Windows UMS is the right way to fix this (indeed, I wish other OSes had an equivalent to UMS; our blocked system call detection just targets the lowest common denominator right now). It seems likely that UMS support will obviate the need for a high timer frequency in the Go runtime itself.

Some quick thoughts, from my perspective inside the Windows organization. Although UMS seems like a good fit, I don't believe it will actually work that well for this use case. I can't recall the details right now, but for example I have found that UMS thread switches that should be cheap sometimes result in expensive cross-processor context switches. I looked into fixing this in Windows, but in general UMS is considered deprecated technology, with support only on a subset of x86_64 processors (no planned support for x86 or ARM32 or ARM64).

So while I'd love to make UMS work for this case, it would likely require Windows changes to be effective, and we're not likely to make these Windows changes soon.

However, some good news: newer versions of Windows (from roughly 2019 onward) support high-resolution timers without increasing the timer frequency. There is a (currently undocumented, working to fix this) flag to CreateWaitableTimerExW, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, that essentially creates a timer that expires when you ask it to instead of aligning to a tick. So a less invasive fix to this problem might be to use this on operating systems that support it.

From the public SDK:

#define CREATE_WAITABLE_TIMER_MANUAL_RESET  0x00000001
#if (_WIN32_WINNT >= _NT_TARGET_VERSION_WIN10_RS4)
#define CREATE_WAITABLE_TIMER_HIGH_RESOLUTION 0x00000002
#endif

@jstarks

There is a (currently undocumented, working to fix this) flag to CreateWaitableTimerExW, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, that essentially creates a timer that expires when you ask it to instead of aligning to a tick.

I take it you propose to change runtime.usleep2 function

// Runs on OS stack. duration (in 100ns units) is in BX.
// The function leaves room for 4 syscall parameters
// (as per windows amd64 calling convention).
TEXT runtime·usleep2(SB),NOSPLIT|NOFRAME,$48
MOVQ SP, AX
ANDQ $~15, SP // alignment as per Windows requirement
MOVQ AX, 40(SP)
// Want negative 100ns units.
NEGQ BX
LEAQ 32(SP), R8 // ptime
MOVQ BX, (R8)
MOVQ $-1, CX // handle
MOVQ $0, DX // alertable
MOVQ runtime·_NtWaitForSingleObject(SB), AX
CALL AX
MOVQ 40(SP), SP
RET

Am I correct?

How would I use CreateWaitableTimerExW to do that? CreateWaitableTimerExW just creates a handle. How do I use the handle to sleep for intervals of less than 1 ms?

Thank you.

Alex

Unfortunately you will now need multiple syscalls, which will increase the overhead of this call (but maybe that's OK since you're sleeping anyway). You'll need to call SetWaitableTimer on the timer handle with the desired due time, and then you can call WaitForSingleObject (or NtWaitForSingleObject, doesn't matter) without a timeout.

Of course, you'll need a separate waitable timer pre-allocated for each thread that might be doing this. Is that one per M?

@zx2c4 may be interested in this thread.

zx2c4 commented

Thanks for @-ing me in @jstarks.

Right, it looks like usleep2 is executed on an m using its g0 stack. In this case, we'd just add the pre-allocated handle to mOS, where we already have some semaphores and locks and whatnot. The usleep2 code is already invoked via function address. So, in the event that CreateWaitableTimerExW(CREATE_WAITABLE_TIMER_HIGH_RESOLUTION) is successful and populates the member in mOS, we'd just update the value of usleep2Addr to point to usleep2_highres or whatever. Seems doable.

Biggest issue potentially is extra overhead from two syscalls. But the timer period would start with the first call to SetWaitableTimer anyway, so maybe indeed it doesn't really matter. After all, sleeps are always expensive since they tend to hit a scheduler.

You'll need to call SetWaitableTimer on the timer handle with the desired due time, and then you can call WaitForSingleObject (or NtWaitForSingleObject, doesn't matter) without a timeout.

Thank you. We already use SetWaitableTimer in runtime profiler. So that should be fine.

Unfortunately you will now need multiple syscalls, which will increase the overhead of this call (but maybe that's OK since you're sleeping anyway).

Lets implement the code first, and then we can decide about this. We might do some optimisations, for example, we could call old code for longer wait time, and only call new code for short waits. Or something like that.

I tried running your code https://play.golang.org/p/Zh8F6iqJOE from #14790 (comment) to reproduce #14790. I had to use set GOMAXPROCS=1, otherwise I could not reproduce #14790.

If I run #14790 test against current tip, I see this:

c:\Users\Alex\dev\src\issue\go\14790>set GOMAXPROCS=1

c:\Users\Alex\dev\src\issue\go\14790>go test -bench=. -count=10 > b && %GOPATH%\bin\benchstat b
name               time/op
DefaultResolution  1.08ms ± 2%
1ms                1.09ms ± 1%

c:\Users\Alex\dev\src\issue\go\14790>

But if I change runtime.osRelax, like this

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index a584ada702..a485571003 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -407,6 +407,8 @@ const osRelaxMinNS = 60 * 1e6
 // if we're already using the CPU, but if all Ps are idle there's no
 // need to consume extra power to drive the high-res timer.
 func osRelax(relax bool) uint32 {
+       return 0
+
        if relax {
                return uint32(stdcall1(_timeEndPeriod, 1))
        } else {

I see test output this

c:\Users\Alex\dev\src\issue\go\14790>go test -bench=. -count=10 > b && %GOPATH%\bin\benchstat b
name               time/op
DefaultResolution  9.35ms ±100%
1ms                 1.09ms ± 2%

c:\Users\Alex\dev\src\issue\go\14790>

So, yes, we can still reproduce #14790, if we stop calling timeBeginPeriod from runtime.

Next we should replace timeBeginPeriod with code that uses CreateWaitableTimerExW(CREATE_WAITABLE_TIMER_HIGH_RESOLUTION) to sleep, and see if #14790 is still broken.

I will do that when I have time, unless someone beats me to it.

Did I miss anything?

Alex

Change https://golang.org/cl/248699 mentions this issue: runtime: use CreateWaitableTimerEx to implement usleep

@jstarks

https://golang.org/cl/248699 stops calling timeBeginPeriod on computers with support for CREATE_WAITABLE_TIMER_HIGH_RESOLUTION. Can you, please, verify that #14790 still stays fixed.

Thank you.

Alex

stepped on this issue searching for windows and time resolution issue for golang, not sure I see a workaround or solution though:

Is there a better way to measure time spent than

	start := time.Now()
        // ... do stuff...
	elapsed := time.Since(start)

which works fine on linux and macos but not on windows (~1ms resolution vs ns)
what is the workaround or setting to get same resolution in a cross platform way?

but not on windows (~1ms resolution vs ns)
what is the workaround or setting to get same resolution in a cross platform way?

There is no way to set ns time resolution on Windows.

Alex

Adding to what Alex said ...

There is a way to get better resolution, but it's outside of what Go does for you. You can use the Windows Query Performance Counter function (Windows XP or later). It returns the duration since the program started. So you get a start and a stop time, then subtract to get the interval.

Testing on my laptop showed about 550ns resolution with about 132ns call overhead.

While 1us resolution isn't quite 1ns, it's better than 1ms. :)

What does time.Now() call internally?

Testing on my laptop showed about 550ns resolution

Correct. I posted

https://go-review.googlesource.com/c/go/+/248699/

that fixes this issue. You get about 500ns resolution.

Alex

There is no way to set ns time resolution on Windows.

Sorry I was referring to smallest "waiting / sleeping" time. But if you just want to read time, time.Now is not the best source for "high res" time. You could call Windows APIs with better high res time. (There are so many I don't remember name)

Alex

@jstarks in case this is related... #35482 (comment)

There is a way to get better resolution, but it's outside of what Go does for you. You can use the Windows Query Performance Counter function (Windows XP or later)

Thanks but how do I do that staying in golang?
I'd be happy with usec elapsed time resolution or even 10th or 100th of millisecond; but millisecond is too coarse

ps: I'm not a windows dev just trying to get my golang software to work on windows about the same as it runs on linux and mac, so ideally not making os dependant calls

We should just fix this in Go rather than teach Windows developers to use Windows-specific code. Since Go uses CLOCK_MONOTONIC on Linux, and not CLOCK_MONOTONIC_COARSE, should we update Windows to use QueryUnbiasedInterruptTimePrecise/QueryInterruptTimePrecise to power time.Now()?

See also #36141 - we need a new real-time timer API and to change the default policy on Windows as it differs from unix.

What does time.Now() call internally?

This is how it is implemented

TEXT time·now(SB),NOSPLIT,$0-24

There is even version of that code that calls QPC. But version only runs on WINE.

We should just fix this in Go rather than teach Windows developers to use Windows-specific code. Since Go uses CLOCK_MONOTONIC on Linux, and not CLOCK_MONOTONIC_COARSE, should we update Windows to use QueryUnbiasedInterruptTimePrecise/QueryInterruptTimePrecise to power time.Now()?

I tried using ...Precise Windows APIs and they appear much slower on my tests. See #31160 (comment) for details.

Alex

This fix looks great! The test code was very handy. I modified it to check the current timer resolution before running (since otherwise the results may not be completely valid) and made some other changes and checked it in here:

https://github.com/randomascii/blogstuff/blob/main/timer_interval/waitable_timer.cpp

I verified your results with the timer check present..

@jstarks - any progress on getting this documented? I just checked in CreateWaitableTimerEx still doesn't mention this. The hader makes it look like this is available in 1803 and above but it would be great to confirm this. Chromium might be interested in this.

Change https://golang.org/cl/271907 mentions this issue: runtime: allow for usleep2HighRes to run without TLS setup