radiasoft/rshellweg

improved debugging of Hellweg simulations

Closed this issue · 9 comments

Debugging the Python-wrapped Hellweg library on the Jupyter server is making it difficult to move forward quickly.

Perhaps there are ways to improve the workflow.

An executable (i.e. removing Python from the equation) could be helpful.

@ilyapogorelov please give me an example that is failing and you would like to debug.

The branch is called 125-eoms-for-momentum.
The input files are in the attached zip file. To run: 'python main.py'
debug_example.zip

I'm not sure what I'm supposed to see in the failure. I see lots of output, including:

In TBeamSolver::Solve():  i = 20
In TBeamSolver::DumpBeam(.):  ExportParameters->SpecialFormat  = 3
In TBeamSolver::DumpCST(., ., j), j = 0
px = -nan, py = -nan

In TBeamSolver::DumpCST(., ., j), j = 1
px = -nan, py = -nan

In TBeamSolver::DumpCST(., ., j), j = 2
px = -nan, py = -nan

Are the nan's a problem?

Yes, these nan-s indicate that there is a problem. Here's some context:

The code tracks a distribution of particles, each described by 6 phase-space coordinates (positions, velocities), through a lattice where in general there are electric and magnetic fields present. These fields interact with the particles (and are changed in the process, too). In the example at hand, there are 3 particles, and it takes 20 steps to go through the lattice from start to end. When this example is run with the trunk version of the code, the phase space coordinates of the particles change over time but remain finite. In the development branch, over a few initial steps they grow unphysically large, resulting eventually in NaNs all over the place. I am trying to pin down the source of the discrepancy.

In the current setup, it seems the only approach available to me is to use lots of print statements, whether the cerr output goes to the screen or to a file with the 2>&1 trick. It would be helpful to have more flexibility in regard to stepping through the code, examining variable contents, etc.

I merged master into 125-eoms-for-momentum so I could be sure all fixes were in. The new branch is https://github.com/radiasoft/rshellweg/tree/125-rjn-eoms-for-momentum. The output compares between the two 125 branches except for minor diffs:

35c35
< Par[Sj].SumSin = 0.702499 Par[Sj].SumCos = 0.271415 gamma0 = 1.23418
---
> Par[Sj].SumSin = 0.7025 Par[Sj].SumCos = 0.271415 gamma0 = 1.23418
114c114
< Par[Sj].SumSin = 0.708735 Par[Sj].SumCos = 0.337481 gamma0 = 1.34164
---
> Par[Sj].SumSin = 0.708735 Par[Sj].SumCos = 0.337482 gamma0 = 1.34164

I think it's important to not get too far off master, especially when debugging.

I'll now try to figure out what's going on with the NaN's or at least try to get an environment that makes debugging easier.

My 125- branch is not affected by the merge, right?

It's not possible in this case not to get far off the master. I'm doing a switch to a different set of phase space variables for the particles, so there are major changes to the code in multiple places before you can even compile.

What really would be helpful is an environment that makes debugging easier.
The point wasn't to hand over to you tracking down this particular bug; I've already learned to some extent my way around the code, so it may be easier for me to do this. This issue is really about having more flexibility when it comes to debugging.

Your branch is unaffected. In order to setup debugging, I needed it to be "rshellweg" not "rslinac" so I could modify setup.py without having to worry about merging those changes so that I can enable debugging on the shared object.

there are major changes to the code in multiple places before you can even compile.

To be clear, I've merged all your changes. I didn't merge some format-only changes. I think you need to avoid changing the existing formatting unless it's necessary, or you should do that in a separate set of changes that doesn't change the code, just the whitespace and such. This made it very difficult to merge the changes.

It will save you a lot of time merging in the future if you continue off the new branch, since that works done already.