Normalization of datasets in lazy/eager cases
Opened this issue · 0 comments
A few questions about how normalization is done currently:
-
In the current
master
branch, for Lazy loading of mrcs files, the normalization is done on a 1000 images, but for non-lazy loading, its done on all images. This makes sense from a computational standpoint. Is this a distinction we want to keep going forward (if/when we decide to merge thevb/imagesource
branch)? (Currently I'm determining normalization parameters from 1000 images in that branch, regardless of whether the mode is lazy or eager). -
In the current
master
branch, for lazy loading, real-space windowing is currently not done before determining the normalization parameters. Is this just an oversight and it should have been done for both lazy/eager cases?
def estimate_normalization(self, n=1000):
pp = cp if (self.use_cupy and cp is not None) else np
n = min(n, self.N)
imgs = pp.asarray(
[
fft.ht2_center(self.particles[i].get())
for i in range(0, self.N, self.N // n)
]
)
if self.invert_data:
imgs *= -1
imgs = fft.symmetrize_ht(imgs)
norm = [pp.mean(imgs), pp.std(imgs)]
norm[0] = 0
logger.info("Normalizing HT by {} +/- {}".format(*norm))
return norm
- In the current
master
, I see for normalization that the mean is determined at several places in the codebase, never to be actually used (and0
used as the mean value instead):
norm = [pp.mean(particles), pp.std(particles)]
norm[0] = 0
In these cases, is it okay to take out the pp.mean(particles)
line, or is there something else that needs tweaking?