Normalization of datasets in lazy/eager cases

Question

Normalization of datasets in lazy/eager cases

Opened this issue 2 years ago · 0 comments

A few questions about how normalization is done currently:

In the current master branch, for Lazy loading of mrcs files, the normalization is done on a 1000 images, but for non-lazy loading, its done on all images. This makes sense from a computational standpoint. Is this a distinction we want to keep going forward (if/when we decide to merge the vb/imagesource branch)? (Currently I'm determining normalization parameters from 1000 images in that branch, regardless of whether the mode is lazy or eager).
In the current master branch, for lazy loading, real-space windowing is currently not done before determining the normalization parameters. Is this just an oversight and it should have been done for both lazy/eager cases?

    def estimate_normalization(self, n=1000):
        pp = cp if (self.use_cupy and cp is not None) else np

        n = min(n, self.N)
        imgs = pp.asarray(
            [
                fft.ht2_center(self.particles[i].get())
                for i in range(0, self.N, self.N // n)
            ]
        )
        if self.invert_data:
            imgs *= -1
        imgs = fft.symmetrize_ht(imgs)
        norm = [pp.mean(imgs), pp.std(imgs)]
        norm[0] = 0
        logger.info("Normalizing HT by {} +/- {}".format(*norm))
        return norm

In the current master, I see for normalization that the mean is determined at several places in the codebase, never to be actually used (and 0 used as the mean value instead):

            norm = [pp.mean(particles), pp.std(particles)]
            norm[0] = 0

In these cases, is it okay to take out the pp.mean(particles) line, or is there something else that needs tweaking?