NanoComp/meep

Adjoint solver performance bottleneck

CharlesDove opened this issue · 1 comments

Hi folks, thanks for the great package! While trying out the adjoint solver, I noticed what appears to be a significant performance bottleneck. Specifically, filter_source.py, starting on line 40

        self.bf = [
            lambda t, i=i: 0
            if t > self.T
            else (
                self.nuttall(t, self.center_frequencies)
                / (self.dt / np.sqrt(2 * np.pi))
            )[i]
            for i in range(len(self.center_frequencies))
        ]

produces a list of python lambdas representing sources. This list is then eventually passed into the c++ engine, which calls back to the python lambda at each step. Removing this call seems to yield a ~10x speed improvement. Perhaps this functionality should be pushed into the c++ engine or otherwise compiled?

Hi @CharlesDove, thanks for reaching out! Handling the adjoint sources is rather tricky (as we talk about in our paper). And I think the behavior you're observing is a red herring. Let me try to walk through things:

So this piece of the code simply computes the different basis functions needed to properly compute a broadband source. Evaluating this (that for loop) is actually relatively cheap and only happens once up front. Then, all the sources' spatial profiles are cached in C++, and we call the time profile like any other custom time profile. In other words, this isn't in a "hot loop."

I suspect your speedup is because you are getting rid of the long run times that comes with these narrow basis functions (we talk a lot about this in the paper). But moving this to C++ shouldn't really change anything (unless your forward and adjoint simulations are really really fast, and the overhead is dominated by this?)