Question, weights & signals using the same clamp function ?
Closed this issue · 1 comments
Hello,
Thank you for your quick answer the other day.
I have another question regarding the back-propagation through the normalization function (e.g. clamp).
https://analogvnn.readthedocs.io/en/v1.0.0/sample_code.html
In the figure, both weights and signals need the normalization function. However, we see a different behavior during back-propagation (green arrow).
- For the weights, the back-propagation of the normalization is ignored.
- For the signal, the back-propagation of the normalization is done. Something like ]-inf,-1] and [1;+inf[ -> 0 otherwise it is 1.
In the text of the paper i don't see any explanation for this. In the code you provided, you use the same clamp function for both signal & weights.
What is correct ? What is the intuition behind not back-propagating through the normalization ?
In short, we can ignore back-propagation of the normalization for both weights and signal if normalization function can be thought of as a linear function else sometimes we can ignore it and sometime we can't. (the second paper will focus more on this.)
The thought process was:
- The first paper of analogvnn is suppose to be as generalized as possible for multiple analog systems not just photonics. (in the second paper, probably this summer, I will focus on many different architecture of implementing neural networks in photonics and comparing them with analogvnn and introduce many new photonics layers).
- For weights, when I started working on it, I was think in terms of both weights implemented in PCM or coming through the laser along with inputs. So for the case of laser, we will know the normalization function but for PCM it can be more complicated, so I want to see what happens if we completely ignore it, to show that you don't have to worry about calculating gradient in each layers all the time.
- For inputs, I was just thinking about the case when inputs are directly coming from the laser, so I want to maximizing the correctness of the gradients, but since gradient for Reduce Precision layer and Noise layer is not possible, so I just implement the backward function in normalization layers.
- This was the simulation which I ran for the first paper. and since first paper is suppose to introduce AnalogVNN:
- Framework for everyone to actually run photonic neural networks.
- Give a way to do large scale hyperparameter search in analog domain.
- Show to the community to not just direct compare analog system with digital.
- So, I didn't try to optimize the gradient flow for photonics that much. (will be in second paper)
- But since then I have removed all the backward function from all the normalization function expect for Clamp (because it was not doing much either way)
- And run the test again and found little to no difference.
I am closing the issue, If it doesn't make sense then you can open this issue again.