fix: flamegraph regressions#13
Closed
not-matthias wants to merge 6 commits into
Closed
Conversation
Member
not-matthias
commented
May 28, 2026
- fix(callgrind): drop BBCCs whose top context fn is skip-flagged
- test(callgrind): add C reproducer for runtime obj-skip cxt==0 leak
- chore(callgrind): log underflow resets and instrument_state transitions
- fix(callgrind): discard cost and drop BBCCs from underflow events
When the (cxt == 0) clause in setup_bbcc force-pushes a skipped fn — e.g. the first BB after CALLGRIND_START_INSTRUMENTATION lives in a skipped object — the BBCC ends up with cxt->fn[0]->skip == True. Without filtering, those BBCCs emit a top-level fn= block and the skipped fn leaks into the dump. Filter them out at dump time in print_bbccs_of_thread, right before print_fn_pos would emit the ob=/fl=/fn= header. The call edges from non-skipped callers into skipped fns (cfn=) are unaffected because they're emitted from the caller's BBCC, whose context is not skipped. Also broadens the runtime_obj_skip_c post-check to grep for any fn=skipme*, since the actual leaked fn in the repro is skipme_run (the one calling START_INSTRUMENTATION), not skipme_func.
Minimal C test that triggers the leak path where a function in a skipped object becomes a top-level fn= block in the dump. The trigger: the lib calls CALLGRIND_START_INSTRUMENTATION from inside one of its own functions, so the first BB callgrind sees post-start lives in the skipped object. With cxt == 0 at that point, setup_bbcc force-pushes the skipped fn as the new top context and it leaks into the output. Currently RED: post-check fails because fn=skipme_run appears in the output despite skipme_run's containing .so being on the skip list.
Two diagnostic logs to attribute every (cxt==0) push_cxt event to its root cause: 1. handleUnderflow: prints the BB address, fn name, obj, and skip flag for the fn that's about to be force-pushed after a call-stack underflow. Tells us whether leaks are driven by signals, longjmp, or shadow-stack quality. 2. set_instrument_state: logs every ON/OFF transition with the cxt and fn-stack depth at that moment. Distinguishes 'fresh start with no cxt' from 'stop/start cycle that preserved cxt'. Together with the existing 'push_cxt FORCED' log, these let us bucket the leaked frames in real Python runs by their root cause.
When a RET happens with no matching CALL on Callgrind's shadow stack
(typical when instrumentation starts mid-run), handleUnderflow
synthesizes a phantom caller frame at return-addr - 1, attributes
all cost since the last dump to it via copy_cost(lastdump_cost), and
pushes a fake call_entry. The resulting frame is fabricated:
- the caller fn is a guess based on an address that may not even
be the start of a real BB,
- the attributed cost is whatever happened to accumulate since the
last dump, not anything actually spent in that fn,
- the fn may resolve to an --obj-skip target (e.g. CPython's eval
frame) which we explicitly do not want in the output.
Discard cost and drop the phantom from the dump:
- flag the synthesized source_bbcc with from_underflow=True,
- zero the call_entry enter_cost instead of copying lastdump_cost,
- leave source_bbcc->ecounter_sum at 0 (no execution counted),
- extend the print_bbccs_of_thread filter (added in the
skip-fn-drop fix) to also skip from_underflow BBCCs.
The structural side-effects of handleUnderflow (cxt push, call
stack push, active-count adjustment) are kept intact so subsequent
RETs continue to balance correctly.
Merging this PR will improve performance by 14.39%
Performance Changes
Tip Curious why this is faster? Comment Comparing |
The --obj-skip -> fn->skip propagation was only happening in setup_bbcc on a jk_Call (when callgrind observes a CALL into the fn). When instrumentation turns on mid-function (e.g. CALLGRIND_START_INSTRUMENTATION fired from inside a skipped object, or an underflow synthesizes a new top context), setup_bbcc takes the cxt==0 push path and force-pushes the fn without ever consulting --obj-skip — so fn->skip stays False and the skipped object leaks into the dump. Extract the obj-skip resolution into resolve_obj_skip() and call it on both paths. obj_skip_checked still short-circuits the lookup so the hot path cost is unchanged. This makes the BBCC drop filter from "drop BBCCs whose top context fn is skip-flagged" actually catch the runtime_obj_skip_c repro: skipme_* now gets fn->skip set on the force-push and is filtered at dump time.
dladdr((void*)&skipme_run) from the main binary returns the main binary's path, not the lib's: &skipme_run there is the PLT stub address in the main binary's text. So CALLGRIND_ADD_OBJ_SKIP was registering the wrong object and the test never exercised the runtime obj-skip leak it was supposed to repro. Move the path resolution into the lib via skipme_self_path(), which calls dladdr on its own function pointer. Add -ldl to the lib's link line.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.