Not filling branch delay slot by moving CHERI instructions or other instructions across CHERI instructions

Question

Not filling branch delay slot by moving CHERI instructions or other instructions across CHERI instructions

jonwoodruff opened this issue 7 years ago · 10 comments

When this is built:

int partition( int a[], int l, int r) {
  int pivot, i, j, t;
  pivot = a[l];
  i = l; j = r+1;
		
  while( 1)
  {
   	do ++i; while( a[i] <= pivot && i <= r );
   	do --j; while( a[j] > pivot );
   	if( i >= j ) break;
    t = a[i]; a[i] = a[j]; a[j] = t;
  }
  t = a[l]; a[l] = a[j]; a[j] = t;
  return j;
}

void quickSort( int a[], int l, int r)
{
  int j;

  if( l < r ) {
    j = partition( a, l, r);
    quickSort( a, l, j-1);
    quickSort( a, j+1, r);
  }
}

The instructions around the "JALR" (the only one) when built in purecap are:

	sll	$5, $1, 0
	cmove	$c13,  $c19
	cgetpccsetoffset	$c12, $3
	cjalr	$c12, $c17
	nop

The cmove or the sll could be in the branch delay slot.
When built for MIPS the branch delay slot is filled:

	move	 $4, $17
	move	 $5, $2
	sw	$1, 0($8)
	addiu	$1, $21, -1
	jalr	$25
	sll	$6, $1, 0

Answer 1 · 2017-11-09T11:49:17.000Z

Slightly simplified test case:

__attribute__((always_inline))
static int partition( int a[], int l, int r) {
  int pivot, i, j, t;
  pivot = a[l];
  i = l; j = r+1;
		
  while(i >= j)
  {
   	do --j; while( a[j] > pivot );
    a[i] = a[j]; a[j] = t;
  }
  return j;
}

void quickSort( int a[], int l, int r)
{
  int j;

    j = partition( a, l, r);
    quickSort( a, l, j-1);
    quickSort( a, j+1, r);
}

n64 version:

	addiu	$1, $20, -1
	ld	$25, %call16(quickSort)($gp)
	sll	$6, $1, 0
	move	 $4, $17
	jalr	$25
	move	 $5, $2
	addiu	$5, $20, 1

Pure-cap version:

	addiu	$1, $18, -1
	ld	$3, %call16(quickSort)($gp)
	sll	$5, $1, 0
	cgetpccsetoffset	$c12, $3
	cmove	$c3,  $c18
	move	 $4, $2
	cmove	$c13,  $c19
	cjalr	$c12, $c17
	nop
	addiu	$4, $18, 1

Answer 2 · 2017-11-09T12:33:39.000Z

It looks as if cmove isn't being put in the delay slot because it's marked as having unmodelled side effects. This is also preventing any instructions from being reordered across it.

Answer 3 · 2017-11-09T12:35:18.000Z

Actually, it looks as if this is set for pretty much all capability instructions, which is probably impeding a lot of potential optimisations.

Answer 4 · 2017-11-09T12:41:22.000Z

And this is required because of the implicit C0 behaviour. The real fix probably involves adding an implicit use of C0 to all of the MIPS loads and stores.

Answer 5 · 2017-11-09T14:10:24.000Z

I wonder how crazy it is to simply disable all modifications to special capability registers except for those dedicated instructions? How much overhead will there be to lose the ability to directly read/write C0 in all capability instructions? When I say disable I mean on a hardware level, so implicit C0 modifications will do nothing or trigger exceptions, etc. We already have similar plans in our document. Of course this would be at least a flag week...

Answer 6 · 2017-11-09T14:26:00.000Z

It would be nice to have an experimental run and see. I don't think that we generate stores to $c0 from anything other than an intrinsic in the compiler. We do rely on being able to read $c0 for ctoptr, but I don't think we ever insert modifications.

Ideally, I'd like to make $c0 a capability version of $zero, make $ddc a special register, and have special cases for ctoptr that used $ddc and $ppc.

Answer 7 · 2017-11-09T15:18:42.000Z

Modifying the MIPS loads and stores implicitly use $c0 is complicated by the fact that $c0 is not present on MIPS...

Answer 8 · 2017-11-09T15:32:45.000Z

I guess we should wait until we have CWriteHwr and then only treat that as a hazard?

Answer 9 · 2017-11-09T15:33:36.000Z

Ideally, yes, though I believe it's possible to teach the LLVM back end that MIPS always has C0, but doesn't have any instructions to write to it...

Answer 10 · 2017-11-09T19:26:34.000Z

This is now fixed in the multicapsize branch. It can probably be cherry picked to master, or we can wait until it's time to merge that branch.