missing imaging photometry in metadata table
Closed this issue · 8 comments
It looks like in some cases the output metadata table is not populated with the input (broadband) imaging photometry.
I thought this issue had to do with not being able to measure the aperture correction, but it appears to be something else.
Unfortunately, the bug affects the Iron/v1.0
, Fuji/v2.0
, and Guadalupe/v2.0
catalogs; fortunately, it affects only a relatively small fraction of objects.
Iron/v1.0
: 43,607 (0.242%)Fuji/v2.0
: 12,942 (0.926%)Guadalupe/v2.0
: 580 (0.026%)
E.g., for Iron:
cd $DESI_ROOT/spectro/fastspecfit
prod='iron'
ver='v1.0'
iim = Table(fitsio.read(f'{prod}/{ver}/catalogs/fastspec-{prod}.fits', 'METADATA', \
columns=['TARGETID', 'SURVEY', 'PROGRAM', 'HEALPIX', 'FLUX_G', 'FLUX_R', 'FLUX_Z', 'FLUX_W1']))
iif = Table(fitsio.read(f'{prod}/{ver}/catalogs/fastspec-{prod}.fits', 'FASTSPEC', columns=['LOGMSTAR']))
I = ((iim['FLUX_G'] == 0)*iim['FLUX_R'] == 0)*(iim['FLUX_Z'] == 0)*(iim['FLUX_W1'] == 0)*(iif['LOGMSTAR'] > 0)
print(np.sum(I), np.sum(I)/len(I))
<Table length=43607>
TARGETID SURVEY PROGRAM HEALPIX FLUX_G FLUX_R FLUX_Z FLUX_W1
int64 str7 str6 int32 float32 float32 float32 float32
------------------- ------ ------- ------- ------- ------- ------- -------
6448025174016 sv1 dark 28478 0.0 0.0 0.0 0.0
6515536691200 sv1 dark 4958 0.0 0.0 0.0 0.0
6521555517440 sv1 dark 10436 0.0 0.0 0.0 0.0
6536638234624 sv1 dark 28534 0.0 0.0 0.0 0.0
6546033475584 sv1 dark 5066 0.0 0.0 0.0 0.0
28684312379396 sv1 other 2707 0.0 0.0 0.0 0.0
28684316573697 sv1 other 2708 0.0 0.0 0.0 0.0
28688884170754 sv1 other 2710 0.0 0.0 0.0 0.0
28697990004751 sv1 other 2709 0.0 0.0 0.0 0.0
28702498881536 sv1 other 2710 0.0 0.0 0.0 0.0
28702503075844 sv1 other 2711 0.0 0.0 0.0 0.0
28702503075846 sv1 other 2711 0.0 0.0 0.0 0.0
28707011952640 sv1 other 2711 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ...
2305843055619504455 main backup 44257 0.0 0.0 0.0 0.0
2305843055619508651 main backup 44257 0.0 0.0 0.0 0.0
2305843055619511632 main backup 44256 0.0 0.0 0.0 0.0
2305843055619511838 main backup 44256 0.0 0.0 0.0 0.0
2305843055619516110 main backup 44256 0.0 0.0 0.0 0.0
2305843055619522825 main backup 44257 0.0 0.0 0.0 0.0
2305843055623676124 main backup 44262 0.0 0.0 0.0 0.0
2305843055623690778 main backup 44260 0.0 0.0 0.0 0.0
2305843055623691405 main backup 44260 0.0 0.0 0.0 0.0
2305843055623693312 main backup 44260 0.0 0.0 0.0 0.0
2305843055623694644 main backup 44260 0.0 0.0 0.0 0.0
2305843055623695539 main backup 44260 0.0 0.0 0.0 0.0
2305843055623695956 main backup 44260 0.0 0.0 0.0 0.0
2305843056445765854 main backup 45047 0.0 0.0 0.0 0.0
I'm not sure what the resolution is, but need to make sure to create a unit test to prevent this issue from future productions.
I began looking into this issue and determined that it's not a bug. The vast majority of these sources genuinely lack DR9 photometry and the stellar mass estimate comes from the spectrum alone.
However, there are a handful of cases where desispec.io.photo.gather_tractorphot
reports no photometry which actually exists. The issue is in this bit of code, which I am reproducing at the very bottom of this note:
https://github.com/desihub/desispec/blob/main/py/desispec/io/photo.py#L1029-L1042
In essence, the code checks if there exists a north
Tractor catalog, a south
catalog, or both. If both catalogs exist then if the median declination of the sources in the input catalog is less than the north-south boundary, +32.375 deg
, then the south
catalog is chosen.
However, consider this (secondary) target:
<Table length=1>
TARGETID SURVEY PROGRAM HEALPIX RA DEC FLUX_G FLUX_R FLUX_Z FLUX_W1 RELEASE BRICKID BRICKNAME BRICK_OBJID
int64 str7 str6 int32 float64 float64 str1 float32 float32 float32 int16 int32 str8 int32
-------------- ------ ------- ------- ---------------- ---------------- ------- ------- ------- ------- ------- ------- --------- -----------
50532307697666 sv1 dark 28535 186.497709668916 33.6029271154256 0.0 0.0 0.0 0.0 0 0 1864p335 0
There is a source in the south
Tractor catalog within ~0.4 arcsec:
But in the north
Tractor catalog the nearest source is ~1.5 arcsec away:
(Recall that the matching radius is 1 arcsec.) However, since the declination (+33.603 deg
) is above our north-south declination cut, +32.375 deg
, we end up with no matching source and therefore no photometry.
So the options are to:
- Do nothing; or
- Be smarter about the case where both
north
andsouth
Tractor catalogs exist, namely:- Check both catalogs and select the set of photometry (on an object-by-object basis) which has the most amount of information.
- And if there are fluxes from both the
north
and thesouth
, choosesouth
.
Below is the code currently in main
:
https://github.com/desihub/desispec/blob/main/py/desispec/io/photo.py#L1029-L1042
tractorfile_north = os.path.join(legacysurveydir, 'north', 'tractor', brick[:3], 'tractor-{}.fits'.format(brick))
tractorfile_south = os.path.join(legacysurveydir, 'south', 'tractor', brick[:3], 'tractor-{}.fits'.format(brick))
if os.path.isfile(tractorfile_north) and not os.path.isfile(tractorfile_south):
tractorfile = tractorfile_north
elif not os.path.isfile(tractorfile_north) and os.path.isfile(tractorfile_south):
tractorfile = tractorfile_south
elif os.path.isfile(tractorfile_north) and os.path.isfile(tractorfile_south):
if np.median(input_cat[deccolumn][ipos]) < desitarget_resolve_dec():
tractorfile = tractorfile_south
else:
tractorfile = tractorfile_north
elif not os.path.isfile(tractorfile_north) and not os.path.isfile(tractorfile_south):
return out
@weaverba137 @stephjuneau @sbailey this issue would need be addressed in desispec.io.photo
and would also impact the https://github.com/moustakas/desi-photometry LS/DR9 VACs, so please let me know if you have any thoughts or opinions. Any changes would allow us to extract photometry for a tiny number of secondary targets in the overlap region between DECaLS and BASS+MzLS, so it would be perfectly acceptable to do nothing.
And note that whatever we decide I do not think we should regenerate the Fuji LS/DR9 VAC, so this would only impact the DR1 (Guadalupe + Iron) VACs and database loads.
Hi @moustakas et al., the example source indeed lacks a Tractor entry within 1" when querying against the combined tractor table. Not sure how it was selected as a secondary but it seems to be a small HII region in a spiral or tidal arm (and in good company given all the sources outside the SGA ellipse there - wow!). I agree with "do nothing" for Fuji. For others tables, I think what you proposed in terms of searching in both N and S and then picking the best one could be fine. The caveats are (1) we create inconsistencies with the formal "resolve" for joining the S+N footprints; (2) we might risk creating a cross-match to the wrong targets depending on which secondary program it is (on this topic, we might be able to make decisions on a program per program basis rather than an object per object basis -- for example the PV secondary program should be handled with care as cross-matching positions meant to be along the major or minor axis of a large galaxy to assign them their own photometry is questionable).
Expanding on Stephanie's caveat "we create inconsistencies with the formal "resolve" for joining the S+N footprints":
IIUC, elsewhere we treat the +32.375 deg boundary as the strict and only definition of whether sources should be taken from the N or the S footprints, i.e. as if there was no overlap, with the actual overlap being a expert-level debugging feature for data cross checks. In that spirit, I think you should "do nothing" and not do anything clever with checking both footprints in the overlap region. Strictly following the boundary as if the overlap didn't exist would be more consistent with how final target selection resolve catalogs are generated and the benefits to consistency outweigh the recovery of photometry for a small number of targets, especially given that a match in one but not the other is already a sign of something special/problematic about that target.
Mentioning @geordie666 in case there are any caveats to how strict the +32.375 boundary is.
To clarify: primary targets will always get the correct photometry because TARGETID
encodes the appropriate north/south resolve, brickid, objid, etc.
I'm only talking about secondary targets here for which desispec.io.photo.gather_tractorphot
uses positional matching to get the photometry. For some classes of targets this is absolutely the wrong thing to do (e.g., an off-center position of a large galaxy targeted as part of the PV program); but for other programs (e.g., a faint dwarf galaxy targeted using non-DR9 imaging), what the code is doing is defensible.
I think it's not crazy to try to check both north and south catalogs in the very rare case of an object like 50532307697666
. But maybe I'm splitting hairs.
Even for secondary targets, I advocate that you should use the same resolve algorithm as primary target selection, i.e. the +32.375 N/S boundary, without further refinements/complications.
i.e. we do not do anything tricky like "if a dec>32.375 target fails target selection in the N but passes in the S, keep it", so by extension I think you also should not do anything like "if an ra,dec>32.375 location doesn't have a positional match in the N but does in the S, keep it".
Perhaps the tie breaker for @geordie666 to comment on: target selection has the option for secondary targets to be position-matched to primary targets and inherit the primary TARGETID if there is a match. If a dec>32.375 secondary does not have a N match, does the code check if there is a match in the S, or is the S data ignored and this is treated as a non-match full stop?
i.e. I think desispec.io.photo should resolve to the same N/S positional matching algorithm that target selection would have, if matching had been requested at the time of target selection. But perhaps that ship has already sailed and you are already "trying harder" to find a match in other ways...
Regarding the tiebreaker:
The target referenced above by @moustakas is also in the Main Survey, during which desitarget
was matching (with a 1" radius) to recover fluxes from imaging:
a = fitsio.read("/global/cfs/cdirs/desi/public/ets/target/catalogs/dr9/1.1.1/targets/main/secondary/dark/targets-dark-secondary.fits")
a[80084][["RA", "DEC", "OVERRIDE", "FLUX_G", "FLUX_R", "FLUX_Z", "TARGETID", "SCND_TARGET"]]
Out[]: (186.49770967, 33.60292712, False, 0., 0., 0., 2249555563249666, 2048)
Note that this is an OVERRIDE==False
target, so would have been allowed to be merged with a primary. But, no fluxes were recovered and the TARGETID
corresponds to a secondary; so, this target never found a primary with which to merge.
Philosophically, I'm not opposed to breaching the N/S boundary when matching secondaries to primaries. But, if the question is asked purely on the basis of "what would desitarget
do" then we shouldn't cross the N/S boundary, I think.
Note that I haven't checked all possible corner cases in the code, so I can't promise desitarget
never crosses the boundary when matching primaries and secondaries. But, it's fairly convincing that desitarget
didn't cross the boundary for the quoted example (TARGETID==50532307697666
in sv1
which is TARGETID==2249555563249666
in main
).
One note on the N/S resolve: The boundary isn't purely at dec=32.375o. Sources are resolved based on Galactic latitude, too. Sources have to be both north of the Galactic plane (in Galactic coordinates) and north of Dec ≥ 32.375o
(in equatorial coordinates) to be considered an "N" source.
From @sbailey:
Even for secondary targets, I advocate that you should use the same resolve algorithm as primary target selection, i.e. the +32.375 N/S boundary, without further refinements/complications.
i.e. we do not do anything tricky like "if a dec>32.375 target fails target selection in the N but passes in the S, keep it", so by extension I think you also should not do anything like "if an ra,dec>32.375 location doesn't have a positional match in the N but does in the S, keep it".
I strongly disagree and I'll note that this is not what the code currently does. This algorithm also doesn't extend to other imaging datasets like DR10 where there is no north-south split and you'd be throwing away data if you did a resolve.
From @geordie666:
Philosophically, I'm not opposed to breaching the N/S boundary when matching secondaries to primaries. But, if the question is asked purely on the basis of "what would desitarget do" then we shouldn't cross the N/S boundary, I think.
For secondary targets, desispec.io.photo.gather_tractorphot
intentionally doesn't do what desitarget
does. In fact, it was developed in large part because there were secondary targets with perfectly reasonable DR9 photometry that was (intentionally) not propagated into the pipeline. For non-DR9-selected secondary targets, I think it's totally defensible to (a) use positional matching; and (b) cross the north-south resolve boundary in order to prefer DECam imaging (which is what the code does).
In the end, after looking at many other of these targets, I don't think we should do anything different than what we're already doing. I'm going to close this ticket since it's on the critical path for the next set of VACs but if anyone else has thoughts or comments please feel free to add them.