Skip to content

PR of Maira's pmdomain/downstream/timeouts branch#7400

Open
pelwell wants to merge 3 commits into
raspberrypi:rpi-6.18.yfrom
mairacanal:pmdomain/downstream/timeouts
Open

PR of Maira's pmdomain/downstream/timeouts branch#7400
pelwell wants to merge 3 commits into
raspberrypi:rpi-6.18.yfrom
mairacanal:pmdomain/downstream/timeouts

Conversation

@pelwell
Copy link
Copy Markdown
Contributor

@pelwell pelwell commented May 25, 2026

Turn Maira's branch into a PR to get the build artefacts.

Commit 18605b1 ("pmdomain: bcm: bcm2835-power: Increase ASB control
timeout") raised the ASB handshake polling budget from 1us to 5us.
Surveying the pmdomain subsystem, 5us is still the smallest polling budget
by a wide margin - comparable handshakes in other drivers use:

  - 100us : starfive jh71xx-pmu, apple pmgr-pwrstate
  - 1ms   : renesas rcar-sysc, rmobile-sysc (power-on)
  - 10ms  : renesas rcar-gen4-sysc, sunxi sun55i-pck600
  - 1s    : mediatek mtk-pm-domains, mtk-scpsys

Raise the bcm2835 timeout to 500us, matching analogous drivers. 500us is
still negligible relative to a power-domain transition and gives the V3D
master ASB substantially more headroom to drain under heavy workloads,
where 5us has been observed to be insufficient in practice.

Cc: stable@vger.kernel.org
Fixes: b826d2c ("pmdomain: bcm: bcm2835-power: Increase ASB control timeout")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
v3d_mmu_set_page_table() ends by calling v3d_mmu_flush_all() to flush the
MMU cache and clear the TLB after reprogramming V3D_MMU_PT_PA_BASE.
v3d_mmu_flush_all() is gated by pm_runtime_get_if_active(), which returns
0 unless runtime_status == RPM_ACTIVE.

v3d_mmu_set_page_table() is called from two paths that *know* V3D is
reachable, but where the runtime PM status is wrong:

  1. v3d_power_resume(): the runtime resume callback itself, where
     runtime_status is RPM_RESUMING.

  2. v3d_reset(): called from the DRM scheduler timeout handler with the
     hung job's pm_runtime reference held, so RPM_ACTIVE, but here we
     don't need to take an extra reference for the duration of the flush
     either.

In both cases pm_runtime_get_if_active() returns 0, the flush is silently
skipped, and V3D resumes executing with whatever MMUC/TLB state happened
to survive the last reset. On BCM2711, this leaves stale translations live
across runtime PM cycles, manifesting as random GPU hangs.

Split the actual flush sequence into a helper that does the writes
unconditionally, and have v3d_mmu_set_page_table() call it directly.

Fixes: 17af1d14deaf ("drm/v3d: Introduce Runtime Power Management")
Cc: stable@vger.kernel.org
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants