ratt-ru/QuartiCal

[Looking to contribute] Where can I find the parameter update loop?

Opened this issue · 10 comments

Describe the problem that the feature should address
I'd like to contribute a small extension to quartical.

What I'd like to do is a small modification to the parameter update loop, where you compute the gradients. Can you point me there? I'll experiment with adding a regularisation term, that enforces smoothness over aperature of phases. I'm exploring calibration schemes for DSA2000 and would like to see if Quartical could be the test bed, so I'd need to have a working understanding of how it's designed. @JSKenyon would you be able to assist this understanding?

Hi @Joshuaalbert! I can certainly try to help you.

The first thing to understand is where the solvers are actually called - each gain type calls a different underlying implementation. That happens here (apologies in advance, the calling function needs polishing):

jhj, info_tup = solver(base_args,
term_args,
meta_args,
kwargs["corr_mode"])

If you want to utilize QuartiCal as a test bed for different solvers/approaches, you are more than welcome to implement an entirely new solver which can slot in at that location - the only real requirement is the way in which the arguments enter the function.

Obviously, an entirely new solver may be overkill. If you want to modify one of the existing ones, that code lives in /quartical/gains. Each directory contains all the implementation specifics for a given term. As an example, if you wanted to modify the phase-only solver, you would find the relevant files in quartical/gains/phase. The phase-only solver is here:

def phase_solver(base_args, term_args, meta_args, corr_mode):

And the call which computes the gradients (well, sort of - QC doesn't ever explicitly form up the Jacobian or J^H J) is here:

compute_jhj_jhr(base_args,
term_args,
meta_args,
upsampled_imdry,
extents,
corr_mode)

Hopefully that is enough to get your started! I have neglected to mention anything about the Dask layer and passing additional arguments into the solver, so let me know if those block you.

One final thing to be aware of is #244 - I am in the middle of making a fairly large number of changes that will probably make things a little clearer/easier. Just wanted to give you fair warning as the code may shift under you a little.

A pattern that I'd like to explore is:

  1. solve with any given quartical solver
  2. smooth finally before applying

Do you think this could be seen as a term? So I could do something like,

terms: ["G" "smooth"]
iter_recipe:[1000 1]

G:
  type: phase
  time_interval: "1"
  freq_interval: "0"

smooth:
  type: smooth
  kwargs: ...

@JSKenyon You mention in the documentation that effectively setting an iter_recipe element to 0 acts as convienient interpolation, and this seems like it would be close to that. So far my experiments show for DSA that smoothing after calibration is good enough when the gain noise is low-enough. I suspect there is also a constraint on the systematics allowable too.

This two-pass approach is a fast approximation of actually regularised optimisation. So, it's not ideal, but it's far more efficient than a general approach.

Questions:

  1. If I wanted to use this approach, then it seems like I need to implement a solver with type=smooth. Is this simply a matter of copying the gains/phase/ directory, and modifying the kernel.py, and then augmenting some map somewhere?

  2. How do I define specific kwargs that I'd like passed to the function that produces the gain updates?

Hi @Joshuaalbert - apologies for the slow reply. I have been a combination of on leave/ill for the last week or so.

A pattern that I'd like to explore is:

1. solve with any given quartical solver

2. smooth finally before applying

Do you think this could be seen as a term?

This is something we have considered in the past but it has some problems from a computational perspective, depending on whether you require access to all the gain solutions before smoothing. It is not impossible but I suspect it will end up being unnecessarily complicated.

@JSKenyon You mention in the documentation that effectively setting an iter_recipe element to 0 acts as convienient interpolation, and this seems like it would be close to that. So far my experiments show for DSA that smoothing after calibration is good enough when the gain noise is low-enough. I suspect there is also a constraint on the systematics allowable too.

This two-pass approach is a fast approximation of actually regularised optimisation. So, it's not ideal, but it's far more efficient than a general approach.

Absolutely, and in fact I am working on improving this functionality to correctly interpolate parameterized terms. I think that this may be the ideal location to incorporate smoothing functionality as you will have access to all the gains. Smoothing is just a more intelligent form of interpolation in this context.

1. If I wanted to use [this](https://github.com/ratt-ru/QuartiCal/issues/245#issuecomment-1527813968) approach, then it seems like I need to implement a solver with `type=smooth`. Is this simply a matter of copying the `gains/phase/` directory, and modifying the `kernel.py`, and then augmenting some map somewhere?

You could definitely try doing what you suggest. The only issue I see is that the smooth term would actually be updating an ML gain (if I am following you correctly). The map in question is https://github.com/ratt-ru/QuartiCal/blob/main/quartical/gains/__init__.py.

2. How do I define specific kwargs that I'd like passed to the function that produces the gain updates?

It depends - each gain term has a a base_args, term_args and meta_args named tuple associated with it. For adding something term specific it would need to be added to:

phase_args = namedtuple(

Note that this relies on the argument in question having entered the dask layer with the same name. Unfortunately, this is where things get complicated as the dask code is involved. It may be worth setting up a Zoom to discuss it at some point.

Finally, I think that the best way of doing this would not involve a smoothing term in the chain - chains are currently quite uniform i.e. one term behaves like another. I worry that a smoothing term in the chain may become complicated. Instead, it may make more sense to aim for a more general smoothing solution such that any gain could be loaded with something like interp_mode=smooth. This would always require two passes through the data - one to form up the ML gains and a second to apply the smooth solutions. This is probably unavoidable in any case as we typically cannot hang onto all the visibility data while waiting for the gain solutions.

Let me know if you think it would be easier to discuss this in a meeting.

@Joshuaalbert I am very curious to see what your experiments reveal. It would be nice to contrast this to the visibility space approach that @rcyndie is working on. I smooth bandpass solutions after the fact and this seems to work pretty well (this is actually why the JHJ terms are written out as one of the gain output products, soon also for parameterised terms). For bandpass it is a bit difficult to do this inside the solver layer because neighbouring frequency chunks don't speak to each other, but you don't have that problem for the direction axis. One thing that might be missing at the moment are the direction coordinates which would be needed if you want to smooth using GPR with some form of stationary smoothing kernel. I know @JSKenyon is working on that. It may be beneficial to have a meeting to discuss this as we have been thinking along similar lines. If you do get around to setting up a meeting please can I be included?

When I do smoothing in a densely filled aperture with a stationary kernel (not the fancy tomographic kernel), the gains do get closer to ground truth. However, in https://www.aanda.org/articles/aa/full_html/2020/03/aa37424-19/aa37424-19.html you will see that we actually proved that using the tomographic kernel beats using a simple kernel (in both dense and sparse aperture sampling).

At Lofar with low frequencies we got outliers in gains that threw things for a bit of a loop. Even when outliers are identified and flagged there are still phase wraps to deal with. Any phase wrap will cause problems. So this solution really only works at higher frequencies. Certainly L-band.
Any way this looks like with DSA data:
Screenshot from 2023-05-09 17-07-26

@landmanbester Update on this. Smoothing in the presence of tiny systematics leads to considerable deterioration of the solutions. And decoherence due to the ionosphere changing is enough at 700MHz to render smoothing unless you know for certain your temporal solution intervals are short enough.

I'm currently looking at alternative approaches to smoothing on solutions to achieve science goals.

Decoherence due to varying ionosphere

These are from simulated visibilties with frozen flow, with nominal ionosphere properties. At 10s, there is decoherence happening, and dropping to 6s removes that. Smoothing with decoherence leads to large errors.

Screenshot from 2023-06-21 02-48-59

Just to make sure I am following (I missed your previous post somehow, sorry) are each of the dots a direction that has been solved for and do you have homogeneous SNR in each direction? If not you can consider using JHJ as a proxy for the weights when doing the smoothing. I have found that this works well but that the overall scale might be off so using JHJ/sigma**2 where you treat sigma as a hyperparameter might be necessary. If there are considerable systematics in the data (like residual RFI) I find that using a Student's-t likelihood during smoothing helps but runs the risk of suppressing real gain variations if the parameters are initialised too far off their nominal values. Definitely initialising with JHJ as weights seems to work better than starting from unity.

Have you considered smoothing in time as well to deal with a varying ionosphere? I mean by making the time solution intervals quite short and then smoothing after the fact.

Any phase wrap will cause problems.

We have been thinking about ways to deal with this but it is hard, especially for noisy data. I think we do not have sufficient experience with real data to know if our current solution will work in realistic cases, or when it would work. Do you see phase wraps in the spatial axes or is it sufficient to have a zeroth order term to capture them? I mean is it usually possible to write the phase as phi(t, nu, l, m) = phi0(t, nu) + delta(l, m) where all the wrapping can be captured by the DI term phi0(t, nu)?

Sorry @JSKenyon, we have definitely strayed from the original issue. These are all just things I have been thinking about. Might be good to touch base at some point

@landmanbester each dot is an antenna actually. It just looks like a FoV because it's the DSA2k and it's a well-filled aperature (a little bit more than 1-square km actually within a 18km x 10km oval).

We're doing some highly controlled experiments with simulated dsa2k data and are learning a fair amount about smoothing, as it was inintially a targeted approach for speeding up calibration while meeting science goals. I'd be happy to have a call to share what is being learned and get you involved.