RadioAstronomySoftwareGroup/pyuvsim

simsetup.SkyModelData strips reference frequency from Healpix maps

rlbyrne opened this issue · 17 comments

I am attempting to run a pyuvsim simulation using run_uvdata_uvsim function. I am using simsetup.SkyModelData to format a diffuse Healpix map that I can pass to the simulation function, but I've found that the formatting strips the reference_frequency attribute from the map, which then causes the simulation to fail.

Here is some code that identifies the issue. The diffuse map used is available for download here:

healpix_map_path = "/safepool/rbyrne/diffuse_map.skyh5"
diffuse_map = pyradiosky.SkyModel()
diffuse_map.read_skyh5(healpix_map_path)

# Reformat the map with a spectral index
diffuse_map.spectral_type = "spectral_index"
diffuse_map.spectral_index = np.full(diffuse_map.Ncomponents, -0.8)
diffuse_map.reference_frequency = Quantity(
    np.full(diffuse_map.Ncomponents, diffuse_map.freq_array[0].value), "Hz"
)
diffuse_map.freq_array = None
if not diffuse_map.check():
    print("Error: Diffuse map fails check.")

diffuse_map_pyuvsim_formatted = pyuvsim.simsetup.SkyModelData(diffuse_map)
print(diffuse_map_pyuvsim_formatted.reference_frequency)

Inputting this into the simulation produces the following error:

Traceback (most recent call last):
  File "/home/rbyrne/rlb_LWA/pyuvsim_testing.py", line 29, in <module>
    diffuse_sim_uv = pyuvsim.uvsim.run_uvdata_uvsim(
  File "/usr/local/miniconda3/envs/pyuvsim/lib/python3.10/site-packages/pyuvsim/uvsim.py", line 745, in run_uvdata_uvsim
    for task in local_task_iter:
  File "/usr/local/miniconda3/envs/pyuvsim/lib/python3.10/site-packages/pyuvsim/uvsim.py", line 531, in uvdata_to_task_iter
    sky = catalog.get_skymodel(src_i)
  File "/usr/local/miniconda3/envs/pyuvsim/lib/python3.10/site-packages/pyuvsim/simsetup.py", line 754, in get_skymodel
    return obj.get_skymodel()
  File "/usr/local/miniconda3/envs/pyuvsim/lib/python3.10/site-packages/pyuvsim/simsetup.py", line 774, in get_skymodel
    other['reference_frequency'] = self.reference_frequency * units.Hz
TypeError: unsupported operand type(s) for *: 'NoneType' and 'Unit'

I found a workaround to this error by adding this line:

diffuse_map_pyuvsim_formatted.reference_frequency = diffuse_map.reference_frequency.value

Hi @rlbyrne -- I've tried running your example, but I'm not seeing the issue. Is the first block of code supposed to error?

Sorry I wasn't clear enough. It doesn't error, but the reference_frequency parameter is None, which then errors in the simulation step. My simulation call is this:

diffuse_sim_uv = pyuvsim.uvsim.run_uvdata_uvsim(
    input_uv=uv,
    beam_list=BeamList(beam_list=[airy_beam]),
    beam_dict=None,  # Same beam for all ants
    catalog=diffuse_map_pyuvsim_formatted,
)

where I'm using a standard uv object from reading an MWA uvfits and I've generated an airy beam with this:

airy_beam = pyuvsim.AnalyticBeam("airy", diameter=14.)
airy_beam.peak_normalize()

@rlbyrne I ran the first block of code, and it printed out:

No frame available in this file, assuming 'icrs'. Consider re-writing this file to ensure future compatility.
[1.82e+08 1.82e+08 1.82e+08 ... 1.82e+08 1.82e+08 1.82e+08]

Should it have returned None? Or does the reference_frequency only go to None when passed to the run_uvdata_uvsim funtion?

(Incidentally, we should fix the typo in that warning message)

@aelanman That's strange, on my machine it's returning None. I've installed the latest development versions of pyradiosky and pyuvsim.

@rlbyrne Can you check if diffuse_map_pyuvsim_formatted.stokes_I is also None? It could be an issue with the mpi shared memory broadcasting.

@aelanman diffuse_map_pyuvsim_formatted.stokes_I is populated, but reference_frequency is not.

@rlbyrne Okay, so I wasn't able to reproduce the error using the code you sent, but when I tried running it with MPI using two processes I found a couple things that might be related:

  1. When initializing a SkyModelData object from the SkyModel made from your file, the reference_frequency UVParameter is not marked as "required". In the init, SkyModelData will only save required parameters from the SkyModel input. Running diffuse_map.set_spectral_type_params("spectral_index") instead of just setting the spectral_type fixes this. I'm a little surprised that I was seeing the reference_frequency parameter set at all.
  2. You also need to run the function diffuse_map_pyuvsim_formatted.share() before run_uvdata_uvsim, assuming you're working in MPI, in order for the data to be properly spread across processes.

@aelanman Good to know, I'll make those changes. What command are you using to run with MPI?

Here's the minimum working example (mwe.py) that I made from your example code:

import numpy as np
from astropy.units import Quantity
import pyradiosky
import pyuvsim
from pyuvsim import mpi

mpi.start_mpi(block_nonroot_stdout=False)
diffuse_map = None

if mpi.rank == 0:

    healpix_map_path = "diffuse_map.skyh5"
    diffuse_map = pyradiosky.SkyModel()
    diffuse_map.read_skyh5(healpix_map_path)
    # Reformat the map with a spectral index
    diffuse_map.spectral_type = "spectral_index"
    diffuse_map.spectral_index = np.full(diffuse_map.Ncomponents, -0.8)
    diffuse_map.reference_frequency = Quantity(
        np.full(diffuse_map.Ncomponents, diffuse_map.freq_array[0].value), "Hz"
    )
    diffuse_map._reference_frequency.required = True
    diffuse_map.freq_array = None
    if not diffuse_map.check():
        print("Error: Diffuse map fails check.")
dm_smd = pyuvsim.simsetup.SkyModelData(diffuse_map)
dm_smd.share()

print(mpi.rank, dm_smd.reference_frequency)

To run with mpi and two processes, I did mpirun -n 2 python mwe.py.

I forgot to mention that SkyModelData also didn't like the units of Stokes I (Jy/sr), though that should be acceptable for surface brightness. We'll need to fix the conversion step here:

pyuvsim/pyuvsim/simsetup.py

Lines 616 to 622 in 34ffb2d

if isinstance(stokes_in, units.Quantity):
if stokes_in.unit.is_equivalent("Jy"):
stokes_in = stokes_in.to_value("Jy")
self.flux_unit = "Jy"
elif stokes_in.unit.is_equivalent("K"):
stokes_in = stokes_in.to_value("K")
self.flux_unit = "K"

@aelanman Yes, it looks like Jy/sr is definitely not treated properly. What should I expect from the simulation I ran last week with units of Jy/sr? What normalization did it use?

@rlbyrne This looks to be a bigger issue. Both simsetup.py and pyradiosky use the astropy.units.unit.is_equivalent function to check whether skymodel units are equivalent to K or Jy, but astropy doesn't recognize Jy/sr as equivalent to K (K sr to Jy) except when the equivalencies is set.

pyuvsim uses the SkyModelData class as a way to coerce the SkyModel into a format that MPI can handle, which means turning astropy Quantities into plain numpy arrays. Once it's been shared successfully with MPI, it is then converted back to a SkyModel using SkyModelData.get_skymodel(). If it's a healpix map, it's then converted to point sources using SkyModel.healpix_to_point.

By my reading, if the unit equivalence checks fails, then it will leave the stokes array as an astropy unit. This breaks the MPI share, but if you're not using MPI then it should run okay and leave the units alone. I'm a little confused as to why pyradiosky can then successfully convert it to Jy, since it should fail the unit equivalence test there... @bhazelton ?

Anyway, I think we can take the following course of action here:

  1. Replace the lines I referenced before with:
if isinstance(stokes_in, units.Quantity):
    self.flux_unit = stokes_in.unit.to_string()
    stokes_in = stokes_in.value

(We don't really need to check for valid units here... just ensure that the units are preserved)
2. Ensure that the is_equivalent tests in pyradiosky recognize that Jy/sr should be equivalent to K.
3. For the original issue, we can have SkyModelData warn when non-required parameters are set.

@aelanman I'm fairly new to using MPI. Can you explain the if mpi.rank == 0 in your MWE? I want to make sure I'm implementing this correctly.

When run in MPI, each process runs the script as a whole. By putting that block in if mpi.rank ==0, I'm making sure that the SkyModel object is only initialized on the zeroth process (on other processes, the diffuse_map object is just None). The SkyModelData object needs to be initialized on all processes simultaneously, and then the share() function copies over the data from the zeroth process through a mix of copying and referencing shared memory.

Ok, so to run the simulation I would call run_uvdata_uvsim at the end of your MWE?

Also, how would I make sure the input uvdata object and the beam are shared appropriately?