NeuromatchAcademy/course-content-dl

Issue on page /tutorials/W1D3_MultiLayerPerceptrons/student/W1D3_Tutorial1.html

William-Gong opened this issue · 1 comments

In Coding Exercise 1: Function approximation with ReLU. The plotting only consists of basis ReLu functions and approximated function. This is very confusing because the basis functions are non-zero everywhere, but the approximated function is a sine function. I recommend adding a weighted Relu activations subplot after the subplot of basis ReLu functions.

The plotting code will be updated to:

def plot_function_approximation(x, combination_weights, relu_acts, y_hat):
  """
  Helper function to plot ReLU activations and
  function approximations

  Args:
    x: torch.tensor
      Incoming Data
    relu_acts: torch.tensor
      Computed ReLU activations for each point along the x axis (x)
    y_hat: torch.tensor
      Estimated labels/class predictions
      Weighted sum of ReLU activations for every point along x axis

  Returns:
    Nothing
  """
  fig, axes = plt.subplots(3, 1)
  fig.set_figheight(12)
  fig.set_figwidth(12)
  # Plot ReLU Activations
  axes[0].plot(x, relu_acts.T);
  axes[0].set(xlabel='x',
              ylabel='Activation',
              title='ReLU Activations - Basis Functions')
  labels = [f"ReLU {i + 1}" for i in range(relu_acts.shape[0])]
  axes[0].legend(labels, ncol = 2)

  weighted_relu = relu_acts * combination_weights[:,None]
  axes[1].plot(x, weighted_relu.T);
  axes[1].set_ylim([-2, 2])
  axes[1].set(xlabel='x',
              ylabel='Activation',
              title='Weighted ReLU Activations')


  # Plot Function Approximation
  axes[2].plot(x, torch.sin(x), label='truth')
  axes[2].plot(x, y_hat, label='estimated')
  axes[2].legend()
  axes[2].set(xlabel='x',
              ylabel='y(x)',
              title='Function Approximation')

  #plt.tight_layout()
  plt.show()

**And the coding excise will be updated to:**
def approximate_function(x_train, y_train):
  """
  Function to compute and combine ReLU activations

  Args:
    x_train: torch.tensor
      Training data
    y_train: torch.tensor
      Ground truth labels corresponding to training data

  Returns:
    relu_acts: torch.tensor
      Computed ReLU activations for each point along the x axis (x)
    y_hat: torch.tensor
      Estimated labels/class predictions
      Weighted sum of ReLU activations for every point along x axis
    x: torch.tensor
      x-axis points
  """

  # Number of relus
  n_relus = x_train.shape[0] - 1

  # x axis points (more than x train)
  x = torch.linspace(torch.min(x_train), torch.max(x_train), 1000)

  ## COMPUTE RELU ACTIVATIONS

  # First determine what bias terms should be for each of `n_relus` ReLUs
  b = -x_train[:-1]

  # Compute ReLU activations for each point along the x axis (x)
  relu_acts = torch.zeros((n_relus, x.shape[0]))

  for i_relu in range(n_relus):
    relu_acts[i_relu, :] = torch.relu(x + b[i_relu])

  ## COMBINE RELU ACTIVATIONS

  # Set up weights for weighted sum of ReLUs
  combination_weights = torch.zeros((n_relus, ))

  # Figure out weights on each ReLU
  prev_slope = 0
  for i in range(n_relus):
    delta_x = x_train[i+1] - x_train[i]
    slope = (y_train[i+1] - y_train[i]) / delta_x
    combination_weights[i] = slope - prev_slope
    prev_slope = slope

  # Get output of weighted sum of ReLU activations for every point along x axis
  y_hat = combination_weights @ relu_acts

  return combination_weights, y_hat, relu_acts, x

# Add event to airtable
atform.add_event('Coding Exercise 1: Function approximation with ReLU')


# Make training data from sine function
N_train = 10
x_train = torch.linspace(0, 2*np.pi, N_train).view(-1, 1)
y_train = torch.sin(x_train)

## Uncomment the lines below to test your function approximation
combination_weights, y_hat, relu_acts, x = approximate_function(x_train, y_train)

with plt.xkcd():
  plot_function_approximation(x, combination_weights, relu_acts, y_hat)

Hi @William-Gong,

Thanks so much for your contribution. We highly appreciate the same!

I ran your code and got the following output.
image

It seems to me that the graph could raise additional questions regarding interpretability. For instance: "Why is the behavior of ReLU in latter layers different from the former layers?" or "Why the slope of certain layers is steeper than the slope of others?". While these are great questions that delve into the intrinsic behavior of weighted ReLU activations, we currently think this added complexity is infeasible to implement across Neuromatch scale for all (pod) levels.

Having said that, your contributions at improving our content is highly valuable. Thank you for the same!

Kind Regards,
Gagana!