scikit-hep/root_numpy

add_branch, delete_branch

arogozhnikov opened this issue · 11 comments

People keep asking me (~10 did already) how to add a column to ROOT with root_numpy.

I think this worth writing in documentation or even better - add explicit functions, which will also contain necessary checks.

So far I'm giving link to an issue #224

I also have the following code:

https://github.com/US-ATLAS-HFSF/HFSF2015/blob/master/macros/apply_weights.py

which actually adds a new branch based on an existing branch.

I also have the following code

When I opened previous issue, for-looping in rootpy is what I wanted to get rid of.

@ndawe proposed a better way (see linked issue).

@arogozhnikov - "better" is really, really subjective. @ndawe will point this out but the code he proposes won't work on incredibly large ROOT TTrees because it requires holding the entire numpy array in memory for the new branch.

The other part is that it's a very rare case that you would want to just add a brand new branch to a tree.. if you do, it's somehow based on the value of another branch. You end up running into a similar issue I pointed out by having to load the entire branch (or multiple branches) in memory, perform some calculation, and then write that back.

won't work on incredibly large ROOT TTrees because it requires holding the entire numpy array in memory for the new branch.

I don't work with incredibly large ROOT TTrees.

And I won't recommend for-looping in python as a solution if you have other options.

The other part is that it's a very rare case that you would want to just add a brand new branch to a tree..

That is case I am getting asked about.

Question is exactly following: how to add numpy column to ROOT file (so other people don't have any problems with keeping column in memory).

I am nice with using root2array to get data, array2root to save, why I should NOT have functions proposed in topic?

ndawe commented

@arogozhnikov sure, I can add an example. To add columns to a recarray I use rec_append_fields from numpy.lib.recfunctions. To add a branch to an existing tree I use array2tree and also pass along the tree I want the branch added to:

http://rootpy.github.io/root_numpy/reference/generated/root_numpy.array2tree.html#root_numpy.array2tree

See the docs for the tree argument.

ndawe commented

Note that adding branches to existing trees is easy, but to delete a branch you should read an array from it and delete the column before converting that into a new tree.

+1 to have this functionality.

@ndawe Can you add to array2tree check, that numpy.array has the same length as a tree to avoid corrupted root after adding?

ndawe commented

Creating branches of different lengths is technically possible and could have a use somewhere. I don't want to prevent that. At the ROOT level, this might only result in a warning when using SetEntries(). Technically, the ROOT file isn't corrupted. I can add a warning though on the root_numpy side.

I'm adding an example of appending new branches to an existing tree in the docs now.

ndawe commented

Regarding the deletion of a branch, it's best to deactivate the branches you want to remove (see TTree::SetBranchStatus()) before calling TTree::CloneTree() (only the active branches are copied). root_numpy is not needed there, and for large trees (like @kratsg says) this would be the preferred method as you don't need to read it all into memory as an array.

ndawe commented

The docs now has a FAQ page addressing these questions:

http://rootpy.github.io/root_numpy/faq.html

and I've included an example of adding new branches in the docs of array2tree:

http://rootpy.github.io/root_numpy/reference/generated/root_numpy.array2tree.html#root_numpy.array2tree

In short, I don't believe the functions add_branch and delete_branch belong in root_numpy.

ndawe commented

Closing for now.