volatile inline assembly

Question

volatile inline assembly

jacob-carlborg opened this issue 3 months ago · 8 comments

I have some inline assembly that LDC optimizes away when using the -O3 flag. The documentation of LDC's inline assembly refers to GDC's documentation, which refers to GCC's. With GCC's extended inline assembly you can use asm volatile to avoid optimizing the inline assembly. I tried asm volatile {, but that resulted in syntax errors. Is there a corresponding way to do that in D?

Here's a reduced test case:

import ldc.attributes;

@naked extern (C) void foo()
{
  bar();

  asm
  {
    q"ASM
      cli
1:    hlt
      jmp 1b
ASM";
  }
}

noreturn bar()
{
  while (true) {}
}

Compiling the above code with: ldc2 --output-s main.d -c -betterC -mtriple i386-freestanding, produces the following assembly:

	.text
	.file	"main.d"
	.section	.text.foo,"ax",@progbits
	.globl	foo
	.p2align	4, 0x90
	.type	foo,@function
foo:
	.cfi_startproc
	calll	_D6foobar3barFZNn@PLT
	#APP
	cli
.Ltmp0:
	hlt
	jmp	.Ltmp0

	#NO_APP
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc

	.section	.text._D6foobar3barFZNn,"ax",@progbits
	.globl	_D6foobar3barFZNn
	.p2align	4, 0x90
	.type	_D6foobar3barFZNn,@function
_D6foobar3barFZNn:
	.cfi_startproc
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset %ebp, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register %ebp
	jmp	.LBB1_1
.LBB1_1:
	movb	$1, %al
	testb	$1, %al
	jne	.LBB1_2
	jmp	.LBB1_4
.LBB1_2:
	jmp	.LBB1_3
.LBB1_3:
	jmp	.LBB1_1
.LBB1_4:
	popl	%ebp
	.cfi_def_cfa %esp, 4
	retl
.Lfunc_end1:
	.size	_D6foobar3barFZNn, .Lfunc_end1-_D6foobar3barFZNn
	.cfi_endproc

	.ident	"ldc version 1.39.0"
	.section	".note.GNU-stack","",@progbits

Adding the -O3 flag produces this assembly:

	.text
	.file	"main.d"
	.section	.text.foo,"ax",@progbits
	.globl	foo
	.p2align	4, 0x90
	.type	foo,@function
foo:
	.cfi_startproc
	.p2align	4, 0x90
.LBB0_1:
	jmp	.LBB0_1
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc

	.section	.text._D6foobar3barFZNn,"ax",@progbits
	.globl	_D6foobar3barFZNn
	.p2align	4, 0x90
	.type	_D6foobar3barFZNn,@function
_D6foobar3barFZNn:
	.p2align	4, 0x90
.LBB1_1:
	jmp	.LBB1_1
.Lfunc_end1:
	.size	_D6foobar3barFZNn, .Lfunc_end1-_D6foobar3barFZNn

	.ident	"ldc version 1.39.0"
	.section	".note.GNU-stack","",@progbits

With optimizations enabled the cli and hlt instructs from the inline assembly have been removed and the call to bar has been inlined.

Seems I can add the @optStrategy("none") to foo as a workaround.

Answer 1 · 2024-09-30T13:05:08.000Z

I doubt the asm itself is optimized, it seems just optimized away because it comes after an infinite loop.

Answer 2 · 2024-09-30T13:29:44.000Z

Yes, that seems to be the case.

Answer 3 · 2024-09-30T15:33:35.000Z

Indeed I think the reason the asm code is optimized away is due to the infinite loop or noreturn annotation of `bar.

To answer your question about the volatile equivalent, I think specifying "memory" as clobber for the asm code may do the trick (add : : : "memory" to your asm sequence). https://stackoverflow.com/questions/14449141/the-difference-between-asm-asm-volatile-and-clobbering-memory

Answer 4 · 2024-10-01T07:54:39.000Z

Adding : : : "memory" did not help unfortunately.

Answer 5 · 2024-10-01T09:36:41.000Z

What do you expect with that infinite loop? Is there a clang equivalent where the asm is kept? I very much doubt so.

Answer 6 · 2024-10-01T19:08:36.000Z

Is there a clang equivalent where the asm is kept? I very much doubt so.

You are correct. Both Clang and GCC removes the inline assembly regardless if volatile and/or : : : "memory" is used. Clang even removed it without optimizations enabled when I used _Noreturn.

What do you expect with that infinite loop?

It's part of an OS kernel. I'm not an expert in this subject but, as far as I understand, an interrupt can break/pause the infinite loop.

Answer 7 · 2024-10-01T19:10:59.000Z

But I probably won't need the infinite loop anyway. I think I can close this issue. Thanks for the input.

Answer 8 · 2024-10-01T19:25:15.000Z

Yeah I guess the @optStrategy("none") workaround is the best option here (and sufficient).