C-compiler-optimizations

Branch Optimization

If optimization

void f (int *p)
{
  if (p) g(1);
  if (p) g(2);
  return;
}

Can be simply replaced by

void f (int *p) 
{
    if (p) {
        g(1);
        g(2);

    }
    return;
}

Value Range Optimization

for(int i = 1; i < 100; i++) {
    if (i)
        g();
}

The if expression can be eliminated since it is already known that i is a positive integer.

for(int i = 1; i < 100; i++) {
    g();
}

Branch elimination

goto L1;
  //do something

L1:
  goto L2 //L1 branch is unnecessary

Unswitching

As opposed to checking if some condition or the other is true inside of a loop, you can take the if out of the loop and then loop.

for (int i = 0; i < N; i++) 
    if (x)
        a[i] = 0;
    else
        b[i] = 0;

if (x)
    for (int i = 0; i < N; i++)
        a[i] = 0
else
    for (int i = 0; i < N; i++)
        b[i] = 0;

Tail Recursion

A tail recursive call can be replaced by a goto statement which avoids keeping track of the stack frame.

Let's take a simple recursive function

int f(int i) {
    if (i > 0) {
        g(i);
        return f(i - 1)
    }
    else
        return 0;
}

Now let's optimize it

int f (int i) {
    entry:

    if (i > 0) {
        g(i);
        i--;
        goto entry;
    }
    else
        return 0;
}

Try/Catch block optimization

Try/Catch blocks that never throw an exception can be optimized

try {
    int i = 1;
} catch (Exception e) {
    //some code
}

Can be turned into int i = 1;

Loop Optimizations

Loop unrolling

When the different iterations of a loop are independent

for (int i = 0; i < 100; i++) {
    g();
}

The loop can be optimized

for (int i = 0; i < 100; i += 2) {
    g();
    g();
}

This can of course be done even more aggressively

Loop Collapsing

int a[100][300];

for (i = 0; i < 300; i++)
  for (j = 0; j < 100; j++)
    a[j][i] = 0;

Nested loops can be collapsed into a single loop where the index iterates over range(0,\product_j index_j)

int a[100][300];
int *p = &a[0][0];

for (i = 0; i < 30000; i++)
  *p++ = 0;

Loop fusion

Two separate loops can be fused together

for (i = 0; i < 300; i++)
  a[i] = a[i] + 3;

for (i = 0; i < 300; i++)
  b[i] = b[i] + 4;

for (i = 0; i < 300; i++) {
  a[i] = a[i] + 3;
  b[i] = b[i] + 4;
}

Forward store

Stores to global variables in loops can be moved out of the loop

int sum;

void f (void)
{
  int i;

  sum = 0;
  for (i = 0; i < 100; i++)
    sum += a[i];
}

int sum;

void f (void)
{
  int i;
  register int t;

  t = 0;
  for (i = 0; i < 100; i++)
    t += a[i];
  sum = t;
}

Access pattern optimization

Volative Optimization

volatile keyword is used to declare objects that may have unintended side effects.

volatile int x,y;
int a[SIZE];

void f (void) {
    int i;
    for (i = 0;  i < SIZE; i++)
        a[i] = x + y;
}

You would think that you could hoist the computation of x+y outside of the loop

volatile int x,y;
int a[SIZE];

void f (void) {
    int i;
    int temp = x + y;
    for (i = 0;  i < SIZE; i++)
        a[i] = temp;
}

However if x and y are volatile then this optimization might in fact be incorrect which is why compilers will not perform it.

Quick Optimization

Accessed objects can be cached into a temporary variable

{
    for(i = 0; i < 10; i++)
          arr[i] = obj.i + volatile_var;
}

Below is the code fragment after Quick Optimization.

{
         t = obj.i;
         for(i = 0; i < 10; i++)
           arr[i] = t + volatile_var;
}

printf Optimization

Calling printf() invokes the external library function printf()

#include <stdio.h>

void f (char *s)
{
  printf ("%s", s);
}

The string can be formatted at compile time using

#include <stdio.h>

void f (char *s)
{
  fputs (s, stdout);
}

Dead code elimination

Unused code is removed

int i = 1;
return //something else

Constant Propagation/Constant folding

int x = 3;
int y = 4 + x; //replaced by y = 7

return (x + y) //replaced by 10

Instruction combining

Below is a simple case of this, loop unrolling can reveal instances where instruction combining is possible

i++;
i++;

i += 2

Narrowing

unsigned short int s;

(s >> 20)      /* all bits of precision have been shifted out, thus 0 */
(s > 0x10000)  /* 16 bit value can't be greater than 17 bit, thus 0 */
(s == -1)      /* can't be negative, thus 0 */

Integer Multiply

This a well known one, given an expression

int f (int i,int n)
{
  return i * n;
}

Multiplication can be replaced by leftwise bitwise shifting

int f (int i)
{
  return i << (n-1);
}

Integer mod optimization

Another known one, integer divide is really expensive on hardware.

int f (int x,int y)
{
  return x % y;
}

Hacker's delight is a wonderful book that's encyclopedic in its treatment of cool bit tricks such as the one below.

int f (int x)
{
  int temp = x & (y-1);
  return (x < 0) ? ((temp == 0) ? 0 : (temp | ~(y-1))) : temp;
}

Block Merging

Suppose you had the following code fragment

int a;
int b;

void f (int x, int y)
{
  goto L1;                   /* basic block 1 */

L2:                          /* basic block 2 */
  b = x + y;
  goto L3;

L1:                          /* basic block 3 */
  a = x + y;
  goto L2;

L3:                          /* basic block 4 */
  return;
}

The different blocks will be optimized into one

int a;
int b;

void f (int x, int y)
{
  a = x + y;                 /* basic block 1 */
  b = x + y;
  return;
}

Common SubExpression

The second code fragment above can further be optimzed into

tmp = x + y
a   = tmp
b   = tmp
return;

Function inlining

A lot of optimizations can be discovered if a function call is replaced by the body of the function

Suppose we wish to implement a substraction function given a working addition function

int add (int x, int y)
{
  return x + y;
}

int sub (int x, int y)
{
  return add (x, -y);
}

Expanding add() at the call site in sub() yields:

int sub (int x, int y)
{
  return x + -y;
}

which can be further optimized to:

int sub (int x, int y)
{
  return x - y;
}

msaroufim/C-compiler-optimizations