Skip to content

Use hadd with podio-merge-files#952

Open
jmcarcell wants to merge 1 commit intomasterfrom
use-hadd
Open

Use hadd with podio-merge-files#952
jmcarcell wants to merge 1 commit intomasterfrom
use-hadd

Conversation

@jmcarcell
Copy link
Copy Markdown
Member

@jmcarcell jmcarcell commented Apr 2, 2026

Currently podio-merge-files reads every Frame and writes it back. This is very slow (between 10x and 100x) compared to using hadd; I would say unusable. For TTrees and RNTuples, our files can be merged with hadd, producing readable files with the only caveat that there are some categories that may be repeated a few times, like metadata. The new implementation is equivalent to what we had before, but much faster.

BEGINRELEASENOTES

  • Use hadd with ROOT files in podio-merge-files to speed it up and add code to make sure the metadata is set correctly. Keep the old behaviour for the other backends (SIO only).

ENDRELEASENOTES

@meleneemil

m.SetFastMethod(ROOT.kTRUE)
m.AddObjectNames("metadata")
if not m.PartialMerge(ROOT.TFileMerger.kAll | ROOT.TFileMerger.kOnlyListed | ROOT.TFileMerger.kIncremental):
raise RuntimeError(f"TFileMerger failed adding metadata {fmt_name} to {output_file}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(f"TFileMerger failed adding metadata {fmt_name} to {output_file}")
raise RuntimeError(f"TFileMerger failed adding metadata category to {output_file}")

Would make it possible to drop the fmt_name from the arguments with essentially the same information.

Copy link
Copy Markdown
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (minor comments below). I suppose we are not yet at a stage where we have less code if we use the root file merger (that hadd uses internally) directly instead of going through hadd and then "fixing things" afterwards?

Comment on lines +91 to +92
tmp_fd, tmp_path = tempfile.mkstemp(suffix=".root")
os.close(tmp_fd)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with tempfile.NamedTemporaryFile(suffix=".root", delete=True) as f:
    tmp_path = f.name

that way the cleanup below would be automatically done. Or do we need the file handle closed for ROOT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants