baccuslab/pyret

to norm or not to norm?

Closed this issue · 8 comments

Currently, getsta in filtertools.py has the norm flag set to True by default. I think I would prefer if this was False by default, since I think the default should be normalize by the number of spikes. Thoughts?

Does that mean normalization will be left to the caller?

No, I think that by default (if the flag norm is False) we should normalize by dividing by the number of spikes, and if the caller sets norm to True then we should normalize by mean subtracting and making the filter have unit norm. For example:

if norm:
   sta -= _np.mean(sta)
   sta /= _np.linalg.norm(sta)
else:
   sta /= _np.sum(hist[nzhist])

That doesn't sound right to me. The caller should be able to get a completely un-normalized version of the STA, without having to multiply by the number of spikes.

Imagine that I compute an STA in independent segments of the experiment, and then want to average across them. The right thing to do is to weight each STA by the number of spikes that went into it, as then STAs with more spikes are less noisy and more meaningful. I won't be able to do this unless I can get back from filtertools.getsta a truly raw STA.

I should point out that we can't simply delegate this to filtertools.getste anymore, since we no longer return stimulus slices weighted by the number of spikes.

I mean, the whole point of calling it the spike-triggered average is that it is the average stimulus that elicits a spike. If we don't divide by the number of spikes, then it is the spike-triggered sum.

At the very least we should return the number of spikes along with the spike-triggered sum (STS), so that the user can divide to get the STA, but I still think the more common use case would be for the the function to return the STA and number of spikes, and the user can multiply if they need the STS...

OK, I think you're right on this front. Maybe return the STA and the number of spikes?

I don't understand why returning the number of spikes is useful, isn't that given in length(spikes)?

I also agree with Niru that by default getsta should be THE sta.

I think this is complicated by the way I personally handle spike times (just one array for the whole experiment) and conditions. Let's just do the simple/right/easy thing of normalizing by the number of spikes by default, and as a unit vector if norm=True is passed.

Closed in 14a8adf