[FEATURE] Add a function to integrate axes (including over partial ranges)
Dominic-Stafford opened this issue · 1 comments
Dominic-Stafford commented
It would be useful to have an integrate function, which could be used to do the following:
- Remove a single axis from a histogram, reducing its dimension by 1:
h.integrate("y")
- Integrate over a range of an axis:
h.integrate("y", i, j)
- Sum certain entries from a category axis:
h.integrate("y", ["cats", "dogs"])
Currently it is possible to do all of these things, however the syntax is unclear and there are a number of pitfalls:
- Can reasonably easily be achieved with
h[{"y": sum}]
orh[{"y": slice(None, None, sum)}]
, though would be nice to add for completeness. - Can be achieved with
h[{"y": slice(i, j, sum)}]
, however the more obvioush[:, i:j]["y": sum]
will give the wrong result, sincesum
includes the overflow as noted here: scikit-hep/boost-histogram#621 - For this, the corresponding
h[{"y": ["cats", "dogs"]}][{"y": sum}]
almost works, as with this slice any other categories don't seem to be added to the overflow. However, if the overflow already contains entries, these will be added to the sum, so seemingly the only way to get the correct result is to do the sum by hand:h[{"y": "cats"}]+h[{"y": "dogs"}]
which could quickly become laborious. (Could be done ash[{"y": ["cats", "dogs"]}][{"y": slice(0, len, sum)}]
)
Linked to this issue, it would be helpful if one could specify whether to include the overflows when projecting out axes using the project
method, which if adding a new function is not desired, would at least make some other work-arounds easier.
henryiii commented
@fabriceMUKARAGE, here is a rough draft of what the method of BaseHist would look like.
# Loc is int | str | ...
def integrate(self, name: int | str, i_or_list: Loc | list[str | int] | None = None, j: Loc | None = None]) -> Self:
if is_instance(i_or_list, list):
return self[{name: i_or_list}][{name: slice(0, len, sum)}]
return self[{name: slice(i_or_list, j, sum}]
Rough draft of tests:
def test_integrate_simple_cat():
h = hist.new.IntCat([4, 1, 2], name="x").StrCat(["AB", "BCC", "BC"], name="y").Int()
h.fill(4, "AB", 1)
h.fill(4, "BCC", 2)
h.fill(4, "BC", 4)
h.fill(4, "X", 8)
h1 = h.integrate("y", ["AB", "BC"])
assert h1[4j] == 5