Skip to content

wait_for_text: adaptive backoff to reduce subprocess pressure on parallel waits #52

@tony

Description

@tony

Type: performance · Tier: deferred · Tool: wait_for_text

What's happening

Default interval=0.05 means ~20 polls/sec. Each useful poll runs two tmux subprocess-backed operations:

  • display-message (one tmux command via libtmux Pane.display_message)
  • capture-pane (one tmux command via libtmux Pane.capture_pane)

That is roughly 40 tmux subprocess calls per second per active wait. Ten parallel wait_for_text calls across agent instances produce ~400 tmux subprocesses per second hammering the same tmux server.

Not a correctness issue — but a real cost in parallel-agent flows, and a real load on the tmux event loop that other clients share.

Why this isn't urgent

The deterministic alternative for command completion is already shipped: wait_for_channel, which blocks server-side via cmd-wait-for.czero subprocesses per second while waiting. The send_keys docstring and the server system instructions both now name wait_for_channel first, so the agent's default mental path leads off the polling-scraper at the moment the choice is being made.

wait_for_text is the right primitive when the agent does not author the output (third-party process logs, daemon prompts, interactive supervisors). That's a smaller set of calls and a smaller subprocess footprint.

Two viable directions

1. Adaptive backoff inside the poll loop

Use the same exponential-backoff pattern the project already implements in ReadonlyRetryMiddleware:

base_delay = 0.1
max_delay = 1.0
backoff_multiplier = 2.0

Apply only when no match is found on a given tick. First tick uses interval; on no-match, sleep increases up to max_delay. Reset on match (irrelevant — we exit on match) or on caller-supplied higher interval.

Pros:

  • Cheap implementation.
  • Precedent exists in the same codebase.
  • Agents that hit a fast match get fast latency; agents that wait long get cheap polling.

Cons:

  • A new exposed knob (max_interval?) or a hardcoded ceiling that the caller can't override.

2. Raise the default interval

interval: float = 0.2  # was 0.05

Pros:

  • Simpler, more honest about the cost.
  • Callers who need 50 ms polling can still pass it.

Cons:

  • Default-behavior change visible to existing agents (perceived as "slower").
  • Doesn't help long waits.

Recommendation

Adaptive backoff (option 1) — the precedent already exists, and the agent-facing API stays unchanged. Knob exposure can wait until someone needs it. If a stress-test fixture lands that measures subprocess-per-second under N parallel waits, that's the place to land this.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions