feat: add module backend assignment support#1500
Merged
Merged
Conversation
fszontagh
added a commit
to fszontagh/stable-diffusion.cpp
that referenced
this pull request
May 22, 2026
13 new upstream commits since previous sync at 0b82969. The big one is leejet#1500 (module backend assignment): ~1.5k LOC churn that splits backend code into a new ggml_extend_backend.{h,cpp} pair and replaces every runner's (backend_t backend, bool offload_params_to_cpu) constructor arg with (backend_t runtime, backend_t params). New CLI flags --backend te=cpu,vae=cuda0,... and --params-backend te=cpu,vae=cpu,... Other notable upstream changes folded in: 3633072 module backend assignment (leejet#1500) 38b14ad --max-vram -1 auto-detect (leejet#1498) 67dda3f LTX 2.3 architecture (leejet#1463) 06accf2 LTXAV latent2rgb projection 9d68341 Euler/DDIM unification (leejet#1474) cde20d5 stereo handling in sd_audio d7ecbe1 T5 EOS dedup in Anima bd17f53 / 0c1ca17 / 839f6a9 / 3b4d26f ROCm/docs/CI db08b84 GCC 16 build fix 686856e fake-VAE log demotion 0b82969 / 381e0df PR template + CONTRIBUTING.md Conflicts: - examples/common/common.cpp, include/stable-diffusion.h: kept our offload_config alongside upstream's new backend/params_backend strings. sd_ctx_params_t now carries both axes. - src/lora.hpp: dropped our enable_offload bool. The new params_backend argument expresses the same intent (CPU = offload). - src/hidream_o1.hpp: kept params_prefix member, switched constructor to upstream's (backend, params_backend) signature. - src/stable-diffusion.cpp: every runner-construction site took upstream's backend_for(MODULE) / params_backend_for(MODULE) lookups. Removed the dead cond_stage/diffusion/vae_offload_to_cpu local-bool derivation; replaced with calls to a new SDBackendManager::force_module_params_backend(MODULE, "cpu") helper that mutates params_assignment_ after init_backend() runs. The offload_config-driven escalations now land in the same data structure upstream's --params-backend writes to. Post-merge fixups surfaced by retesting HiDream O1 streaming: - src/llm.hpp: TextModel.forward_final_norm now casts to LLMRMSNorm, not RMSNorm. Upstream changed the "norm" block's concrete type; our pre-merge cast returned nullptr and crashed on first forward(). - src/hidream_o1.hpp: Stage 1 of compute_streaming_true scales inputs_embeds by sqrt(hidden_size) when params.llm.normalize_input, matching what forward_embeds does. No-op for HiDream O1 today but keeps the streaming path drift-free if a future arch flips it. Smoke-tested on 12 GB GPU: Z-Image-Turbo Q8 layer_streaming -> 4.32 s HiDream O1 bf16 dev layer_streaming -> 17.44 s (4 steps, 1024x1024)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--backendand--params-backendmodule assignment support.te=cpu,vae=cuda0,diffusion=vulkan0.SDBackendManagerto own and share backend instances by resolved backend name.GGMLRunnerexternally.ggml_extend_backend.hppintoggml_extend_backend.h/.cpp.docs/backend.md.Related Issue / Discussion
Since #1184, there have been multiple PRs containing backend selection related changes. However, none of them achieved the behavior I expected, and the implementations were intertwined with other unrelated changes. I decided to write a standalone implementation myself that better matches my design goals, while also adding support for parameter-based backend selection.
This PR was inspired by the work in stduphf’s PR. When merging this PR, I will add @stduhpf as a co-author.
Additional Information
Checklist