Skip to content

perf(gui): implement batched UI rendering#2514

Open
githubawn wants to merge 18 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch
Open

perf(gui): implement batched UI rendering#2514
githubawn wants to merge 18 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Mar 31, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR introduces a 2D UI render batching system that accumulates draw primitives sharing the same texture/mode/grayscale state and only submits a Render() call when the state must change or the batch ends. The measured results are significant (up to ~4× FPS improvement on the Skirmish screen).

Key design points:

  • Display gains a thin beginBatch/endBatch/flush public API with an m_isBatching flag and protected virtual hooks that W3DDisplay overrides.
  • W3DDisplay::setup2DRenderState centralises all render-state setup and acts as the batch change-detector; it flushes m_2DRender before switching state.
  • Texture reference counting is handled correctly in the batching path: setup2DRenderState calls Add_Ref on the incoming texture; onEndBatch and the next state switch both call Release_Ref.
  • W3DDisplayString::draw explicitly flushes the 2D batch before rendering text, preserving correct draw order.
  • An early-out for completely off-screen images in drawImage is pulled before the texture fetch, saving unnecessary asset manager lookups.

Findings:

  • P2 – blend state not restored after onFlush: onFlush should restore Enable_Grayscale/Enable_Alpha/Enable_Additive to defaults to avoid latent issues when flush() is called outside a batch.
  • P2 – non-reentrant batch semantics: endBatch() ends any active batch unconditionally. If W3DInGameUI::draw and W3DView::draw are ever called in a nested fashion the inner endBatch will terminate the outer batch silently.
  • P2 – inner clipping condition (confirmed correct): The new inner check correctly skips clip computation for fully-interior images; the outer early-return guard is now complete with the added startX >= hi.x / startY >= hi.y cases.

Confidence Score: 4/5

Safe to merge with minor follow-up work; no data loss or crashes expected from current call patterns.

All three findings are P2. Ref-counting is correct, clipping logic is correct, and both prior P1 concerns (m_batchNeedsInit initialisation and Release_Ref ordering) are addressed. Score is 4 rather than 5 due to the non-reentrant batch semantics being a real architectural fragility.

W3DDisplay.cpp (onFlush blend state reset) and W3DInGameUI.cpp (endBatch nesting)

Important Files Changed

Filename Overview
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core of the batching implementation: adds setup2DRenderState, onBeginBatch/onEndBatch/onFlush, and reworks all draw primitives to defer Render() calls. Ref-counting is correct; P2 concern about blend-state restoration after onFlush.
GeneralsMD/Code/GameEngine/Include/GameClient/Display.h Adds public beginBatch/endBatch/flush/isBatching interface and protected virtual hooks; m_isBatching properly initialised in Display.cpp constructor.
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Adds TextureClass forward declaration and four new batch-state members; all members initialised in the W3DDisplay constructor.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Flushes the 2D batch before rendering text to preserve correct draw order; hotkey renderer render moved outside the needNewPolys block to ensure it always fires.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp Wraps the entire InGameUI draw pass in beginBatch/endBatch; P2 concern about non-reentrant batch semantics if ever called within W3DView's own batch region.
Core/GameEngineDevice/Source/W3DDevice/GameClient/W3DView.cpp Minimal change: wraps iterateDrawablesInRegion post-draw callbacks in beginBatch/endBatch to batch drawable 2D overlay rendering.
GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Implements beginBatch/endBatch/flush with guard against nested calls; m_isBatching initialised to FALSE in constructor.

Sequence Diagram

sequenceDiagram
    participant Caller as W3DInGameUI / W3DView
    participant D as Display::beginBatch/endBatch/flush
    participant W3D as W3DDisplay (onBeginBatch/onFlush/onEndBatch)
    participant R2D as Render2DClass (m_2DRender)
    participant TS as W3DDisplayString

    Caller->>D: beginBatch()
    D->>W3D: onBeginBatch()
    W3D->>R2D: Reset()
    Note over W3D: m_batchNeedsInit=TRUE, m_batchTexture=nullptr

    loop Per draw call (drawImage / drawLine / drawFillRect)
        Caller->>W3D: drawXxx()
        W3D->>W3D: setup2DRenderState(tex, mode, grayscale)
        alt State unchanged
            W3D-->>W3D: early return (no flush)
        else State changed
            W3D->>W3D: onFlush() → Render2D.Render() + Reset()
            W3D->>W3D: Add_Ref on new tex, Release_Ref on old tex
        end
        W3D->>R2D: Add_Quad / Add_Tri / Add_Line
        Note over W3D: m_isBatching=TRUE → skip immediate Render()
    end

    opt Text rendering (W3DDisplayString::draw)
        TS->>D: isBatching() → TRUE
        TS->>D: flush()
        D->>W3D: onFlush() → Render() pending batch
        TS->>TS: m_textRenderer.Render()
    end

    Caller->>D: endBatch()
    D->>W3D: onEndBatch()
    W3D->>W3D: onFlush() → final Render() + Reset()
    W3D->>W3D: Release_Ref(m_batchTexture)
    Note over W3D: m_isBatching=FALSE
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp
Line: 2749

Comment:
**Clipping inner-check fires for images fully inside clip region**

After the outer early-return guard was correctly expanded (`startX >= hi.x || startY >= hi.y` is new), the inner condition now triggers for any image that merely *touches* a boundary — including images that are fully inside the clip region:

```cpp
if (screen_rect.Left < m_clipRegion.lo.x ||   // left edge is to the left  of clip (outside)
    screen_rect.Right > m_clipRegion.hi.x ||   // right edge is to the right of clip (outside)
    screen_rect.Top < m_clipRegion.lo.y ||     // top edge is above clip (outside)
    screen_rect.Bottom > m_clipRegion.hi.y)    // bottom edge is below clip (outside)
```

Because `m_clipRegion.lo`/`hi` define the *inside* boundary, an image that sits fully within `[lo, hi]` satisfies none of those conditions, so the block is skipped and `screen_rect`/`uv_rect` are passed unchanged to `Add_Quad`/`Add_Tri`. This is the correct fast-path — no clipping computation is performed.

The concern is the opposite case: an image partially outside on its **right or bottom edge only** (e.g., `screen_rect.Right > m_clipRegion.hi.x`). In the original code this branch was also reached (via the `else` after the old early-return), so the effective behaviour is unchanged. The fix is correct; this is just a note for reviewers.

No change needed — confirmed correct.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp
Line: 613-621

Comment:
**`onFlush` does not restore blend states after render**

`onFlush()` calls `m_2DRender->Render()` then `m_2DRender->Reset()`, but it does not reset `Enable_Grayscale`, `Enable_Alpha`, or `Enable_Additive` to their defaults. The *next* `setup2DRenderState` call will explicitly re-apply these states anyway, so there is no observable bug in the current batched callers.

The latent risk appears when `flush()` is called *outside* a batch (the method is `public virtual` and `m_batchNeedsInit` starts as `FALSE`, so the guard `!m_batchNeedsInit` passes). The subsequent non-batched draw call would inherit whatever blend state was set by the last batch item rather than starting from a clean slate.

Consider adding a state reset at the end of `onFlush` to make the invariant explicit:

```cpp
void W3DDisplay::onFlush()
{
    if (m_2DRender && !m_batchNeedsInit)
    {
        m_2DRender->Render();
        m_2DRender->Reset();
        m_2DRender->Enable_Grayscale(FALSE);
        m_2DRender->Enable_Alpha(TRUE);
        m_2DRender->Enable_Additive(FALSE);
        m_batchNeedsInit = TRUE;
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp
Line: 390-443

Comment:
**`endBatch` inside W3DInGameUI will prematurely terminate an outer batch**

`Display::endBatch` unconditionally ends the batch whenever `m_isBatching == TRUE`. There is no nesting depth counter — only the first `endBatch` call matters.

`W3DView::draw()` now also calls `beginBatch()`/`endBatch()` around `iterateDrawablesInRegion`. If `W3DInGameUI::draw()` is ever invoked from within that region iteration (e.g., via a post-draw callback that triggers UI repainting), the inner `endBatch()` here would flush and close the outer batch prematurely. Any subsequent 2D draws in `W3DView` would then fall through to the non-batched path, silently losing the performance benefit and potentially mis-ordering rendered geometry.

Today the two sites appear to be separate render passes and this path is not taken, but the current implementation has no guard against future refactors that introduce such nesting. A reference-counted `beginBatch`/`endBatch` (or a push/pop stack) would make the semantics explicit and safe.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (13): Last reviewed commit: "Move HUD batching from GameWindowManager..." | Re-trigger Greptile

@githubawn
Copy link
Copy Markdown
Author

githubawn commented Mar 31, 2026

Significantly improves #109
Fully fixing this requires significant refactoring to Render2DSentenceClass for fonts

optimize clipping check
return statement on seperate line
centralized flush
@githubawn
Copy link
Copy Markdown
Author

@greptileai

"One P1 render-state initialization bug in onBeginBatch() should be fixed before merging to avoid potential cross-frame visual corruption."

The P1 finding could be a false positive based on a local variable analysis that omits the downstream hardware synchronization in Render2DClass::Render() (render2d.cpp:601).

Render2DClass is a self-synchronizing state machine. Its Render() call (line 628 for textures, line 694 for shaders) explicitly pushes its internal members to the DX8Wrapper on every execution. The tracking variables in W3DDisplay (m_batchTexture, etc.) are mirror-images of Render2DClass's internal members.

happy?

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

No, the P1 finding stands.

Your argument about Render2DClass being self-synchronizing addresses hardware state consistency, but not the C++ undefined behavior in W3DDisplay.

The issue:

  • onFlush() reads !m_batchNeedsInit (line 617 of your diff)
  • setup2DRenderState() reads !m_batchNeedsInit (line 627)
  • If uninitialized, these branches are unpredictable

Even if Render2DClass correctly synchronizes hardware state on every Render() call, W3DDisplay's control flow still depends on m_batchNeedsInit to decide:

  1. Whether to call Render() in onFlush()
  2. Whether to skip redundant state setup in setup2DRenderState()

An uninitialized bool means these decisions are based on garbage. The fact that the hardware might recover downstream doesn't make reading uninitialized memory defined behavior.

The fix is one line. Keep it.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I see greptile woke up on the sassy side of the motherboard.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

Another test would be to get a lot of units on screen then select them all so their unit info is being drawn.

The health bars make use of the line drawing classes which i believe are inherently affected by this change.

@githubawn
Copy link
Copy Markdown
Author

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.

image

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.
image

Nice ~30% improvement there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants