The function AcpiOsStall doesn't work properly
duselguy opened this issue · 11 comments
Describe the bug
According to the comment in hwsleep.c:
...
/*
* We wanted to sleep > S3, but it didn't happen (by virtue of the
* fact that we are still executing!)
*
* Wait ten seconds, then try again. This is to get S4/S5 to work on
* all machines.
*
* We wait so long to allow chipsets that poll this reg very slowly
* to still read the right value. Ideally, this block would go
* away entirely.
*/
AcpiOsStall (10 * ACPI_USEC_PER_SEC);
...
It looks like we don't wait. I changed the code in hwsleep.c to check it:
...
ACPI_DEBUG_PRINT ((ACPI_DB_INIT,
"Entering sleep state [S%u]\n", SleepState));
AcpiOsStall (10 * ACPI_USEC_PER_SEC); /* Temp changes */
ACPI_DEBUG_PRINT ((ACPI_DB_INIT, /* Temp changes */
"The second entering sleep state [S%u]\n", SleepState)); /* Temp changes */
while (true) {}; /* Temp changes */
/* Clear the SLP_EN and SLP_TYP fields */
...
Host configuration
Linux Veriton 5.13.0-51-generic #58~20.04.1-Ubuntu SMP Tue Jun 14 11:29:12 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
Toolchain configuration
Standard.
Tilck build configuration
- Tilck commit: see screenshot above
Type of issue (select one)
- Incorrect behavior
Reproduction details
Reproducible on different hardware with code changes above.
Expected behavior
Wait requested time.
Screenshots
Screenshot is above.
Additional context
It looks like borrowed modules (for example, utosi.c, actypes.h, hwsleep.c, ... are out of date. Do you have any ideas/plans to synchronize these modules time to time?
Hi Vladimir! Sorry, but it's not clear to me how you triggered that code path for userspace.
By searching in hwsleep.c (ACPICA), I found that code is located in AcpiHwLegacySleep(). It should not be possible to end up there from user space without changing the kernel itself.
If you changed the kernel and wanted to test if ACPI can be used with Tilck to enter in the S3 sleep state, well... I didn't even being to work on that at all. So, it's not supposed to work at the moment. Therefore, this is not a bug. Entering and exiting from sleep requires a considerable amount of work around power management of devices and none of that has been done on Tilck. I don't plan to do that for the foreseeable future.
The AcpiOsStall() function per se should work, because it simply calls delay_us(), which works and I have a test for that. But, in that context (never tested), everything is possible. You're practically in an error case in the AcpiHwLegacySleep() function. AcpiOsEnterSleep() didn't work as expected (of course!) and ACPICA is trying to handle that somehow by busy-waiting for 10 seconds and then trying again. And, that's even in the poweroff code path, after calling kernel_shutdown(). At that point, most of the assumptions simply won't hold. Even if at some point I decide to support ACPI sleep, it won't be in the shutdown code path. It will have a separate code path, where tasks are simply paused (stopped).
In conclusion, I have to close this because it's not really a bug, but the lack of a feature (support for ACPI sleep). The us_delay() function can be tested individually using a selftest on real hardware by running on Tilck the following command:
selftest delay
You should see something like:
root@tilck:/# selftest delay
[devshell] Executing built-in command 'selftest'
[ 4.863] Running self-test: delay
[ 4.915] Expected in ticks: 12
[ 4.915] Actual ticks: 13
Note: I run this in a VM. On real HW it should be precise.
Given that AcpiOsStall() is implemented like this:
void
AcpiOsStall(UINT32 Microseconds)
{
disable_preemption();
{
delay_us(Microseconds);
}
enable_preemption_nosched();
}You can be sure it works in general ;-)
- I checked S5 not S3.
- If you try my code changes in hwsleep.c then you will see that AcpiOsStall doesn't work as expected.
- It may be HW depended, because the loop count in asm depends from it.
- I don't see also your comments why there is no time gap (10 seconds) between 2 last messages in the screenshot.
- In QEMU I don't see 10 seconds too:

Regards,
Vladimir
I don’t understand: have you changed anything else other than adding log messages in hwsleep.c? poweroff works great on QEMU and on many HW machines as well. At least on QEMU we can test exactly the same thing. After you run poweroff, the VM powers off, doesn’t it? It doesn’t matter if the sleep function works or not in that context, what matters to me is if poweroff itself works. On VMs it should work perfectly. Can you reproduce the problem (VM not powering off) with a clean build without any changes? If that’s the case, I’ll need a lot of info about your setup, because it never happened on QEMU for me.
Vlad
I'm working on the problem #99 - power led after poweroff. I hope that may be this problem will be solved if "sleep 10 seconds" will work correctly. If poweroff works "correctly" (power is off, no power led after it) in QEMU, your books and one (from two) of my books, it doesn't mean that the problem with "sleep 10 seconds" (if it exists) should not be investigated.
If you insert few lines of the code in hwsleep.c in the right place to test "sleep 10 seconds" problem, you will see even in QEMU that the system doesn't sleep these 10 seconds. At least it is what I see in the screenshots above.
The changes are after:
ACPI_DEBUG_PRINT ((ACPI_DB_INIT, "Entering sleep state [S%u]\n", SleepState));
in hwsleep.c (see above).
Regards,
Vladimir
@duselguy Vladimir, I really appreciate your effort and interest in this project :-)
Just, in this case I'm not convinced you're investigating in the right direction. The fact that poweroff doesn't work correctly on all the HW machines, has probably nothing to do with AcpiOsStall() not working as expected: if you check the code you'll see that the function is called only if execution continues after AcpiHwWritePm1Control(). ACPICA makes a desperate attempt: wait 10 seconds and then re-try the register write, hoping it will work. I don't believe it will work even after 10 seconds. Making poweroff to work on the machines where currently it doesn't will probably require weeks of hard-work and the implementation of the address space for the Embedded Controller. That will depend on the specific hardware and several implementations will be required to support different machines. In other words, weeks or months of research and work. I don't have the time for that, but I know that has to be done at some point.
Still, for the sake of the experiment, just replace AcpiOsStall (10 * ACPI_USEC_PER_SEC); with the following code, which should work for sure, even if it won't be precise:
for (unsigned j = 0; j < 10; j++) {
for (unsigned i = 0; i < 1000 * 1000 * 1000; i++)
asmVolatile("nop");
}If that changes the outcome and poweroff really succeed on the same machine where previously it didn't, then I'll agree to investigate why AcpiOsStall() does not work as expected in that specific context and will fix it as well. Otherwise, it doesn't make sense to spend time improving the handling of an error-case given that the main code path won't work anyways. Of course, if you believe you'll have fun debugging the problem by yourself and proposing a fix, I would appreciate that.
Anyway, I have to ask again the same question of above: can you really reproduce a problem with poweroff on QEMU? Does QEMU in any case hang instead of powering off the machine and closing the window? I'm really curious to know how you got the QEMU screenshot you attached here. I expect at least on QEMU poweroff to work in 100% of the cases, because it's impossible for me to test Tilck on all the HW machines, but at least on QEMU where the (virtual) machine should be always the same, Tilck must behave correctly. In case it doesn't, I'll fix the bug, if you give me enough context to reproduce it.
Thanks,
Vlad
@duselguy Btw, I spent a few minutes taking a look at delay_us(). It turns about that the function doesn't work simply because it has not been designed to work for too big values for of us. If the value is too big (e.g. 10 seconds = 10 million microseconds), when us is multiplied by loops_per_us, the unsigned wraps around, so we end up waiting way less than 10 seconds.
I'll add some ASSERTs and fix AcpiOsStall() to work. I'll fix this just because it's extremely simple. But, it was a (missed) good opportunity for you to spend some time debugging and discover the problem by yourself.
It took me longer yesterday to figure out the same thing. An open issue doesn't mean I've stopped working on it. Or do you prefer that the issue can only be opened when the issuer has already found (and code) a solution? Not sure if this is the correct approach.
In any case, my solution cannot be better than the author's solution. Thanks for the fix, I'll check it today later.
Regards,
Vladimir
If you have a patch for a bug, please open a pull request instead of an issue. Issues are for the case you just discovered a bug and don’t know how to fix it. Ultimately, only if enough people start contributing to the project by fixing bugs and implementing features, the project will have a chance to become something more than a toy.
Anyway, could you tell me how how did you reproduce an issue with poweroff on qemu?
- I didn't have the patch, I only saw a possible problem in the loops count.
- The reproducing doesn't depend from QEMU/HW. Simply: issue message, sleep 10 seconds, issue message (as I described in my previous comments above).
- OK, no worries ;-)
- Aha. So it's not true that poweroff hanged the machine. You just did a screenshot at the right time or put the sleep call before the actual poweroff call (
AcpiHwWritePm1Control()).