TLorentzVectorArray yields different values depending on masking order
mverzett opened this issue · 4 comments
Hi again! Apologies for bothering you way too often. This time I am quite positive that something strange is going on, but bear with me if it is indeed the intended behaviour:
Both prompt_mu
and second_mu
are coffea's JaggedCandidateArrays
with only one element in each event, therefore flattening makes sense. The issue appears rarely (< 0.1%) and does not seem linked to coffea's classes, I will try to make a simple set-up to reproduce the issue.
(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk])[0,0].mass
0.1089482379549958
(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk]).mass
<JaggedArray [[nan]] at 0x7f7724f16f10>
(Pdb) (skim.prompt_mu.p4 + skim.second_mu.p4).mass[msk]
<JaggedArray [[nan]] at 0x7f7724d6c390>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten()).mass[msk]
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten()[msk] + skim.second_mu.p4.flatten()[msk]).mass
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk].mass
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk][0].mass
0.1089482379549958
(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk])
<JaggedArrayMethods [[TLorentzVector(x=-99.746, y=118.34, z=-767.31, t=782.76)]] at 0x7f7724d660d0>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk]
<TLorentzVectorArray [TLorentzVector(x=-99.746, y=118.34, z=-767.31, t=782.76)] at 0x7f7724d6c650>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten()).mass[msk][0]
nan
Set-up:
- awkward: 0.12.20
- uproot: 3.11.2
- uproot-methods: 0.7.3
It's not surprising that some mass calculations return nan
, especially if they're rare. The mass is a square root of a subtraction of two potentially large numbers, so a small fractional error in energy and momentum can make the argument of the square root slightly negative, and then you get nan
.
Is the odd thing that whether you get nan
or not depends on what order you apply the [0]
, [msk]
, and .mass
? That does sound wrong: numerical round-off is one thing, but the same numbers should be applied to each other in every case—you should either always get nan
or never for different ways of expressing the same quantity in different ways. If there's some sort of off-by-one error in which energy elements are applied to which momentum elements, that's a big issue and needs to be addressed.
Is it possible to make a reproducible example that doesn't require the original data files? Like, can you create Lorentz values that show this issue using fromiter
or by explicitly building JaggedArrays? It looks like you've isolated the particular vectors that show the issue.
Even when the mass isn't nan
, it's suspiciously close to 1× the muon mass. That's weird. Even if the muons are nearly collinear, you'd get 2× the muon mass...
Hi @jpivarski , thanks for looking into it. I tried to export back to ROOT (both float32 and 64), but that washes away the issue. Indeed it seems is a numerical rounding plus something more.
I isolated the problematic vectors here. There are two vectors, v1
and v2
, each of three elements. If summed they yield the issue in the second element. I left it like this in case the issue actually starts in the sum, and kept other two values just for reference.
Thanks again!
import awkward
ff = awkward.load('test.awkd')
(ff['v1'] + ff['v2']).mass
# array([87.1220124 , nan, 65.57995346])
(ff['v1'] + ff['v2'])[1].mass
# 0.1089482379549958
Okay, first off: it's not a scary off-by-one error. Awkward/Uproot-Methods is not adding the wrong numbers together or anything like that.
It is a round-off error, and the reason you see a difference is because f12[1].mass
creates a Python object with Python float
types as x/y/z/t attributes and f12.mass[1]
performs a operation on the NumPy arrays before extracting value [1]
. Python float
types are 64-bit. Your NumPy arrays happen to be 32-bit. NumPy preserves the 32-bit precision through the calculation, and this makes the difference between a small mass, resulting from the subtraction of large energy and momentum, and an imaginary mass, which np.sqrt
returns as nan
.
See below—I'm removing the np.sqrt
part of the calculation for clarity:
>>> v12 = (ff['v1'] + ff['v2'])
>>> v12.t[1]**2 - v12.x[1]**2 - v12.y[1]**2 - v12.z[1]**2
0.011869718553498387
>>> (v12.t**2 - v12.x**2 - v12.y**2 - v12.z**2)[1]
-0.008600915200076997
When the np.sqrt
is applied, the first case returns a small value and the second returns nan
.
@henryiii This is a lesson for the Vector library going forward: single-Lorentz vector objects should have the same storage as arrays of Lorentz vectors. Their x/y/z/t values should probably be NumPy scalar views of the arrays they came from, and hopefully that will ensure that the same precision of operations is performed on single vectors as arrays of vectors.
Maybe Awkward1 needs to change: currently, it returns Python values if you select one element of an array. I kindof hate that feature of NumPy, though, because its scalars are not JSON-encodable, and you end up staring at the screen wondering why it can't turn the number 12
into JSON (because NumPy scalars get printed to the screen the same way as Python numbers).
Alternatively, we can have a policy of doing all our math in 64-bit precision. It's 2020 and that's not slower, even on most GPUs. 32-bit floats are important for data storage/packing on disk, but errors accumulate in a calculation. If you want to move this issue over to vector as a question of policy, that'd be good.
@jpivarski thanks for digging into that! Indeed I was mostly scared by a possible off-by-one or hidden function state failure, that's reassuring. My two cents would be indeed to move to 64bits for calculation.
Thanks again!