Profile and Tune Channel Modeling.

Question

Profile and Tune Channel Modeling.

capn-freako opened this issue 2 years ago · 5 comments

Ever since shifting to an exclusively SciKit-RF based approach to channel modeling in PyBERT, we've experienced a dramatic slow-down in that portion of the simulation engine.
And the performance of the recently Beta-released v4.0.0 exemplifies this.

There's just too much utility in this approach to channel modeling to abandon it and return to our old ways.
It provides a robust, believable, and commonly understood "platform" for building all sorts of composite channel models compatible w/ PyBERT simulation.

So, it's time to profile the code and see how we can improve performance.

Describe the desired new or improved feature.
Channel modeling performance that is, at least, as fast as DFE modeling.

Expected behavior
Channel modeling performance is no worse than DFE modeling performance.

Screenshots
(n/a)

Desktop (please complete the following information):
(all)

Additional context
(n/a)

Answer 1 · 2022-12-29T17:36:51.000Z

My first primitive attempt to get a high level picture of what's going on failed:

(pybert-tst)
[22-12-29 6:06] dbanas@Davids-Air-M2:~
% python -m cProfile -m pybert
Exception occurred in traits notification handler for object: <chaco.plot_containers.GridPlotContainer object at 0x1654505e0>, trait: shape, old value: (0, 0), new value: (2, 2)
Traceback (most recent call last):
  File "/Users/dbanas/prj/PyBERT/.venv/pybert-tst/lib/python3.11/site-packages/traits/trait_notifiers.py", line 342, in __call__
    self.handler(*args)
  File "/Users/dbanas/prj/PyBERT/.venv/pybert-tst/lib/python3.11/site-packages/chaco/plot_containers.py", line 519, in _shape_changed
    self._reflow_layout()
  File "/Users/dbanas/prj/PyBERT/.venv/pybert-tst/lib/python3.11/site-packages/chaco/plot_containers.py", line 513, in _reflow_layout
    grid.resize(self.shape)
ValueError: cannot resize an array that references or is referenced
by another array in this way.
Use the np.resize function or refcheck=False
{Many more just like the above omitted, for brevity.}

We've actually struggled with this particular error previously.

@jdpatt , can you remind me: why does running under the profiler trigger this error when normal execution does not?

Answer 2 · 2022-12-29T23:21:39.000Z

Okay, using the following simple script, I was able to get some data.
(I'm still seeing the error noted above, in my terminal.)

from pybert.pybert import PyBERT

thePyBERT = PyBERT(gui=False)
print(f"Performance: {60*thePyBERT.total_perf/1e6:4.1f} Msmpls./min.")

%  python -m cProfile -s cumulative misc/pybert_prof.py >prof.out
%  less prof.out
Performance:  6.8 Msmpls./min.
         5181320 function calls (5088984 primitive calls) in 4.072 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
{snip}
        1    0.034    0.034    1.006    1.006 pybert.py:1519(calc_chnl_h)
{snip}
        2    0.001    0.001    0.700    0.350 network.py:3049(renormalize)
        2    0.004    0.002    0.662    0.331 network.py:6655(renormalize_s)
        1    0.097    0.097    0.635    0.635 dfe.py:258(run)

Notes on the above results, by line:

Makes sense, since channel simulation is the new performance bottleneck.
I was afraid of this. It's the result of a recent "trick" I introduced into the code for more accurately turning channel frequency response and impedance data into a complete 2-port network.
(Same as 2.)
The DFE has always been a pig. In fact, it used to be the performance bottleneck, before I introduced my "trick".

@jdpatt , any thoughts on a more performant alternative to my "trick" that is still theoretically defensible?

Answer 3 · 2022-12-30T00:08:55.000Z

This SO answer has lots of useful tidbits.

Answer 4 · 2022-12-30T13:23:32.000Z

We've actually struggled with this particular error previously.

@jdpatt , can you remind me: why does running under the profiler trigger this error when normal execution does not?

It has to deal with some under the hood numpy magic, that prevents a memory leak if multiple things like print, debugging or a reference point to an array that has been deleted since numpy is using c optimized code behind the scenes. I believe you have been running with refcheck=False successfully. SO related post

Answer 5 · 2023-01-02T23:01:14.000Z

Using SnakeViz, I was able to dig a little deeper.
My conclusion is:

S to Z-parameter conversion is the main consumer of time in the renormalize() call.
That conversion spends most of its time guarding against matrix singularities, using the skrf.mathFunctions.nudge_eig() function.

For now, at least: Won't fix.