Skip to content

fix(callgrind): initialize stack on instrumentation start#15

Open
not-matthias wants to merge 9 commits into
masterfrom
cod-2714-investigate-pytest-flamegraph-regression-final
Open

fix(callgrind): initialize stack on instrumentation start#15
not-matthias wants to merge 9 commits into
masterfrom
cod-2714-investigate-pytest-flamegraph-regression-final

Conversation

@not-matthias
Copy link
Copy Markdown
Member

  • fix(callgrind): seed shadow call stack at instrumentation start
  • test(callgrind): add regression tests for runtime obj-skip

@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch 2 times, most recently from 35de9c7 to 651fea8 Compare May 29, 2026 15:15
@not-matthias not-matthias marked this pull request as ready for review May 29, 2026 15:17
CodSpeed runs benchmarks with --instr-atstart=no and fires
CALLGRIND_START_INSTRUMENTATION several frames deep (inside libpython, or
behind a V8/JIT trampoline). The shadow stack starts at 0 while real frames
exist, so every later return underflows and the inclusive cost collapses onto
a phantom root.

- Reconstruct the shadow stack from the native stack at the OFF->ON transition
  (CLG_(reconstruct_call_stack_from_native)), seeding each frame's entry SP so
  it pops correctly instead of underflowing.
- Name anonymous JIT frames by address in get_fn_node_for_addr (mirroring the
  BB path) instead of "???", so the seeded root frame (e.g.
  __codspeed_root_frame__) is preserved and stays backend-symbolicatable via
  perf-<pid>.map.
- Add --obj-skip and CALLGRIND_ADD_OBJ_SKIP to exclude whole objects (the node
  binary, libpython) from the call graph.
- runtime_obj_skip_c: a fn from a skipped object must not leak into the
  dump as a top-level fn= block when it is the first BB after START
  (the cxt==0 force-push path)
- runtime_obj_skip_underflow: a RET past an empty call stack
  (handleUnderflow) must not re-leak the skipped fn -- the Python 3.14
  deep recursive interpreter-dispatch shape

Both exercise --obj-skip / CALLGRIND_ADD_OBJ_SKIP from a separately-linked
.so. Also filters callgrind diagnostic logs from test stderr and gitignores
the test build artifacts (binaries, .so, logs).
@not-matthias not-matthias force-pushed the cod-2714-investigate-pytest-flamegraph-regression-final branch from 651fea8 to c6fcb3b Compare May 29, 2026 15:17
@not-matthias not-matthias requested a review from Copilot May 29, 2026 15:18
@not-matthias
Copy link
Copy Markdown
Member Author

@greptile review

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 29, 2026

Merging this PR will degrade performance by 56.98%

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 45 untouched benchmarks
⏩ 12 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_valgrind[valgrind.codspeed, stress-ng --cpu 1 --cpu-ops 10, inline] 5.3 s 4.5 s +17.43%
test_valgrind[valgrind-3.25.1, echo Hello, World!, full-no-inline] 578.2 ms 9,412.3 ms -93.86%
test_valgrind[valgrind-3.25.1, python3 testdata/test.py, full-no-inline] 6 s 5.5 s +10.33%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing cod-2714-investigate-pytest-flamegraph-regression-final (21ab8bb) with master (4a64bb2)

Open in CodSpeed

Footnotes

  1. 12 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a runtime obj-skip leak in Callgrind where, when CALLGRIND_START_INSTRUMENTATION is invoked several frames deep, subsequent returns underflow the (empty) shadow call stack and force-push the returned-into function — leaking functions from skipped objects as top-level fn= blocks. The fix seeds the shadow call stack from the native stack at the OFF→ON transition, and adds matching pop-time context restoration plus regression tests.

Changes:

  • Add CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) and seeds one jcc=0 call entry per native frame (with push_cxt for non-skipped frames); wire it into VG_USERREQ__START_INSTRUMENTATION. Add a new pop_call_stack branch to restore cxt/fn_sp for these seeded entries, and a new CLG_(get_fn_node_for_addr) helper plus exposed CLG_(new_recursion) / CLG_(insert_bbcc_into_hash).
  • Add two C-based regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) with companion shared libraries, vgtest configs, expected outputs, and Makefile/build wiring; extend filter_stderr to strip new diagnostic log lines.
  • Minor .gitignore additions and a new CLG_DEBUG trace in handleUnderflow.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
callgrind/main.c Call new reconstruction helper from START_INSTRUMENTATION
callgrind/callstack.c New stack reconstruction helper; restore cxt for seeded entries in pop_call_stack
callgrind/fn.c New CLG_(get_fn_node_for_addr) resolving raw IPs to fn_node
callgrind/bbcc.c Expose new_recursion/insert_bbcc_into_hash via CLG_(); add underflow debug log
callgrind/global.h Declarations for new exported helpers
callgrind/tests/runtime_obj_skip_c.{c,vgtest,…} Regression: first BB after START in a skipped object
callgrind/tests/runtime_obj_skip_c_lib.c Skipped library that flips instrumentation on
callgrind/tests/runtime_obj_skip_underflow.{c,vgtest,…} Regression: START deep in a recursive skipped lib
callgrind/tests/runtime_obj_skip_underflow_lib.c Recursive skipped library to exercise underflow channel
callgrind/tests/Makefile.am Build the new tests and shared libraries
callgrind/tests/filter_stderr Drop new diagnostic log prefixes from stderr
.gitignore Ignore new build artifacts and dump files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR seeds callgrind's shadow call stack from the client's native stack on CALLGRIND_START_INSTRUMENTATION so that subsequent ret instructions past un-instrumented frames don't trigger handleUnderflow and leak skipped functions as top-level fn= blocks in the output. Two C-based regression tests cover the two leak channels (direct start inside a skipped object, and underflow from a recursive skipped caller).

  • Core fix (callstack.c): adds CLG_(reconstruct_call_stack_from_native) which walks the native stack bottom-up and seeds shadow call_entry slots, calling push_cxt for non-skipped frames so the context chain is correct from the first instrumented BB; pop_call_stack gains a new else if (lower_entry->cxt != 0) branch to restore context when these seeded entries are eventually unwound.
  • Supporting refactors (bbcc.c, fn.c, global.h): new_recursion and insert_bbcc_into_hash are promoted from static to public CLG_() symbols; CLG_(get_fn_node_for_addr) is added to resolve raw IPs to fn_node* without needing a BB*.
  • New tests (Makefile.am + 8 new files): two .vgtest suites, each with a shared-library driver and a post: script that guards against a missing output file before grepping for leaked fn=skipme_ blocks.

Confidence Score: 5/5

Safe to merge; the reconstruction logic is sound, invariants are maintained throughout the loop, and the two regression tests cover both identified leak channels.

The core seeding logic correctly orders ensure_stack_size before push_cxt, pre-zeros each entry's cxt field before the next iteration's assertion, saves and restores fn_sp via the existing push_cxt mechanism, and never reads uninitialized enter_cost for seeded entries (since jcc=0). The pop_call_stack else-if branch correctly discriminates seeded non-skipped frames from genuine skip-style entries via the cxt != 0 sentinel. No correctness issues were identified in the changed paths.

No files require special attention; the two observations on callgrind/global.h (unnecessary public declarations) and callgrind/callstack.c (single-START assumption) are non-blocking.

Important Files Changed

Filename Overview
callgrind/callstack.c Adds reconstruct_call_stack_from_native and a new else if branch in pop_call_stack; ordering of ensure_stack_size/push_cxt is correct, fn_sp/cxt save-restore is sound, and uninitialized fields on seeded entries are never read.
callgrind/bbcc.c Promotes two previously static helpers to public CLG_() symbols; both are still only called from bbcc.c itself, making the public declaration unnecessary API surface.
callgrind/fn.c Adds CLG_(get_fn_node_for_addr) to resolve raw IPs; correctly mirrors the BB-path fallback for anonymous frames and delegates to existing get_fn_node_inseg.
callgrind/global.h Declares CLG_(new_recursion) and CLG_(insert_bbcc_into_hash) as public functions, though neither is called outside bbcc.c.
callgrind/main.c One-line addition: calls CLG_(reconstruct_call_stack_from_native)(tid) immediately after set_instrument_state(..., True) in the START_INSTRUMENTATION handler.
callgrind/tests/Makefile.am Wires in two new test binaries and shared libraries with correct LDADD/LDFLAGS/DEPENDENCIES/CLEANFILES entries.
callgrind/tests/runtime_obj_skip_c.vgtest Test correctly guards against missing output file before grepping, resolving the false-pass scenario noted in prior review.
callgrind/tests/filter_stderr Adds a broad ERE pattern to strip callgrind diagnostic log lines from stderr before comparison; anchored to line start so it won't accidentally drop unrelated lines.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CALLGRIND_ADD_OBJ_SKIP] --> B[objs_to_skip list updated]
    B --> C[CALLGRIND_START_INSTRUMENTATION called]
    C --> D[set_instrument_state True]
    D --> E[reconstruct_call_stack_from_native]
    E --> F{cs->sp != 0?}
    F -- yes --> G[return early, stack already seeded]
    F -- no --> H[VG_get_StackTrace -> n frames]
    H --> I[loop frame n-1 downto 0]
    I --> J[get_fn_node_for_addr]
    J --> K{fn->obj_skip_checked?}
    K -- no --> L[compare vs objs_to_skip list]
    L --> M[fn->obj_skip_checked = True]
    K -- yes --> N{fn->skip?}
    M --> N
    N -- no --> O[push_cxt: saves cxt + fn_sp, advances context chain]
    N -- yes --> P[skip cxt seeding]
    O --> Q[write call_entry: jcc=0, sp=sps+1, ret_addr=ips+1]
    P --> Q
    Q --> R{more frames?}
    R -- yes --> I
    R -- no --> S[shadow stack seeded, current_state.cxt != 0]
    S --> T[instrumented BBs execute normally]
    T --> U{pop_call_stack called on ret}
    U --> V{jcc != 0?}
    V -- yes --> W[normal JCC cost accounting + restore cxt]
    V -- no --> X{lower_entry->cxt != 0?}
    X -- yes --> Y[restore cxt + fn_stack.top from seeded entry]
    X -- no --> Z[skip-style pop, no cxt change]
Loading

Reviews (3): Last reviewed commit: "fixup: asdf" | Re-trigger Greptile

Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/fn.c Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR fixes callgrind's shadow call stack initialization when CALLGRIND_START_INSTRUMENTATION is invoked from deep inside a native call stack. Without seeding, each ret past an unseen frame trips handleUnderflow and leaks skipped-object functions as top-level fn= blocks in the output.

  • callstack.c: Adds CLG_(reconstruct_call_stack_from_native) that walks VG_(get_StackTrace) bottom-up and pushes a jcc=0 call entry per frame; non-skipped frames also get a push_cxt call so the context chain is seeded. A matching else if (lower_entry->cxt != 0) branch in pop_call_stack restores context correctly when these synthetic entries are unwound.
  • fn.c / bbcc.c: Adds CLG_(get_fn_node_for_addr) for IP-to-fn_node resolution without a BB, and exports new_recursion/insert_bbcc_into_hash so they are accessible across translation units.
  • Tests: Adds two C-level regression tests (runtime_obj_skip_c, runtime_obj_skip_underflow) that build a separate shared library, register it for obj-skip, start instrumentation from inside the skipped lib, and assert no fn=skipme_* blocks leak into the dump.

Confidence Score: 4/5

Safe to merge; the core fix is well-reasoned and the two regression tests directly cover the reported leak channels.

The seeded-entry logic in pop_call_stack correctly distinguishes normal skip entries (cxt=0, no restore) from seeded non-skipped entries (cxt≠0, restore via new branch) and avoids interfering with the normal jcc path. The reconstruction loop ordering (push_cxt before ensure_stack_size) is safe under current constants but creates a silent coupling between CLG_RECON_MAX_FRAMES and N_CALL_STACK_INITIAL_ENTRIES. The ce->nonskipped field is left uninitialized in seeded entries and the static buffer in get_fn_node_for_addr relies on an implicit strdup contract — both minor, non-blocking concerns. Test post-scripts give a false-OK if the callgrind output file is absent.

callgrind/callstack.c (the new reconstruct_call_stack_from_native function and the pop_call_stack else-if branch) and callgrind/fn.c (the static buffer in CLG_(get_fn_node_for_addr)).

Important Files Changed

Filename Overview
callgrind/callstack.c Core change: adds reconstruct_call_stack_from_native to seed the shadow call stack on instrumentation start, and extends pop_call_stack with a new else if (cxt != 0) branch to restore context when seeded entries unwind. Logic is sound but has two latent issues: ce->nonskipped is not explicitly zeroed, and push_cxt is called before ensure_stack_size.
callgrind/fn.c Adds CLG_(get_fn_node_for_addr) to resolve raw IPs to fn_nodes for use by the stack reconstruction path. Logic is correct; the static buffer for anonymous-address formatting is safe under the current strdup contract but fragile by design.
callgrind/bbcc.c Promotes new_recursion and insert_bbcc_into_hash from static to exported CLG_-prefixed functions; adds a debug-log line in handleUnderflow. All call sites updated consistently.
callgrind/main.c Calls CLG_(reconstruct_call_stack_from_native)(tid) immediately after CLG_(set_instrument_state) on VG_USERREQ__START_INSTRUMENTATION. Ordering is correct and the cs->sp != 0 guard prevents double-seeding.
callgrind/global.h Declares the new exported symbols. CLG_(new_recursion) and CLG_(insert_bbcc_into_hash) are only used within bbcc.c in this PR; their export appears forward-looking.
callgrind/tests/Makefile.am Adds build rules for two new shared libraries and their test binaries; links with -l:*.so -ldl and RPATH=$ORIGIN so tests find the libs at runtime without install.
callgrind/tests/runtime_obj_skip_c.vgtest Regression test for the obj-skip leak via the cxt==0 push path. Post-check grep can produce a false-OK if the callgrind output file is missing.
callgrind/tests/runtime_obj_skip_underflow.vgtest Regression test for the underflow-channel obj-skip leak. Same false-OK risk as the sister test when the output file is absent.

Sequence Diagram

sequenceDiagram
    participant Client as Client code
    participant VG as Valgrind core
    participant CLG as Callgrind

    Client->>VG: CALLGRIND_START_INSTRUMENTATION
    VG->>CLG: handle_client_request(START)
    CLG->>CLG: set_instrument_state(ON)
    CLG->>VG: VG_(get_StackTrace)(tid, ips[], sps[])
    VG-->>CLG: n frames [bottom to top]
    loop "frame = n-1 downto 0 (oldest first)"
        CLG->>CLG: get_fn_node_for_addr(ips[frame])
        CLG->>CLG: "check obj_skip list -> fn->skip"
        alt fn not skipped
            CLG->>CLG: "push_cxt(fn) -> seed cxt chain"
        end
        CLG->>CLG: "push call_entry(jcc=0, sp=sps[frame+1])"
    end
    Note over CLG: shadow call stack seeded
    Client->>Client: execute benchmark (instrumented)
    loop each RET in instrumented code
        CLG->>CLG: unwind_call_stack(sp)
        CLG->>CLG: pop_call_stack()
        alt "jcc != 0"
            CLG->>CLG: restore cxt + fn_sp (normal path)
        else "cxt != 0 (seeded entry)"
            CLG->>CLG: restore cxt + fn_sp (new path)
        else "cxt == 0 (skipped entry)"
            CLG->>CLG: no cxt restore
        end
    end
Loading

Reviews (2): Last reviewed commit: "test(callgrind): add regression tests fo..." | Re-trigger Greptile

Comment thread callgrind/callstack.c Outdated
Comment thread callgrind/callstack.c
Comment thread callgrind/fn.c
Comment thread callgrind/tests/runtime_obj_skip_c.vgtest Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants