Normalization issue gives blank heatmaps

I'm using tf-explain to generate Vanilla Gradients, and was getting blank heatmaps (0 for every pixel).
Thinking this was strange, I generated the gradients manually and got non-zero values.
So I started running the tf-explain source line by line until I found the culprit for the differences:

tf-explain/tf_explain/utils/image.py

Lines 38 to 42 in 8dff129

    
           grayscale_tensor = tf.reduce_sum(tensor, axis=-1) 
        
           normalized_tensor = tf.cast( 
        
               255 * tf.image.per_image_standardization(grayscale_tensor), tf.uint8 
        
           )

Namely, line 41, where image standardization is applied, is taking gradients which are otherwise non-zero and turning them into zero, outputting a fully blank heatmap. I believe this is not the intended behavior here. The idea was just to normalize the already generated gradient to the [0,1] range, right? I think the issue is tf.image.per_image_standardization does not do this, but rather it standardizes the map to have 0 mean and 1 variance (which is not what we want in this case).

If instead of line 41 we use something like:

normalize01 = tf.keras.layers.Lambda(lambda G : (G - tf.reduce_min(G))/(tf.reduce_max(G)-tf.reduce_min(G)))
normalized_tensor = normalize01(grayscale_tensor)

We get the desired behavior, and a non zero gradient heatmap.

Can you please confirm if I identified this problem correctly? I could make a pull request.
I believe other people might get blank heatmaps from this as well. I'm not sure if this affects only VanillaGradients, but it might affect other methods as well, if they also call that standardization function before generating the final heatmap.
Also, to my understanding the original implementation of vanilla gradients used the maximum across the channels (in this case it would correspond to tf.reduce_max()), but this implementation uses tf.reduce_sum(). Is there a particular rationale behind this choice? Could this make much of a difference?

Here's a sample of the difference in the results in my case. Everything is identical between generating the two heatmaps, except for changing the original standardization step into a normalization one. On top we have the heatmap normalized between [0,1] (what I'm proposing), and on the bottom we have the heatmap standardized with mean = 0 and variance =1 (current implementation, giving blank heatmap although the gradient itself is not blank).

(plotted with 'Blues' colormap)

I encounter the same problem (blank heatmaps) when using SmoothGrad. With the fix proposed by @palatos I get nonempty heatmaps (which also make some sense in my application, I think). It also seems that others do suggest to implement the Vanilla Gradients using a normalization of the gradient heatmap to [0,1] instead of a standardization (BTW in that post they also use reduce_max instead of reduce_sum).

Would you mind actually making that pull request @palatos? This might get the attention of a package maintainer and also makes it easier for me to cite what code I am using in my research (tf-explain + your modification). I can also make that pull request, however, don't want to just bypass you.

Sorry for not answering @palatos. Indeed, I've encountered the problem recently and a PR is more than welcome! The impact is larger than vanilla gradients as this transformation is used on all outputs which have only one channel.

	grayscale_tensor = tf.reduce_sum(tensor, axis=-1)

	normalized_tensor = tf.cast(
	255 * tf.image.per_image_standardization(grayscale_tensor), tf.uint8
	)