Skip to content

[Feature] SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation #1553

@GreenShadows

Description

@GreenShadows

Feature Summary

SEGA dynamically rescales attention across RoPE components based on the latent's spatial-frequency content, enabling stable high-resolution generation without retraining.

Detailed Description

Image

Source: https://rajabi2001.github.io/sega/
Papers: https://arxiv.org/html/2605.22668v1

SEGA dynamically rescales attention across RoPE components based on the latent's spatial-frequency content, enabling stable high-resolution generation without retraining. Our method resolves the trade-off between structure and detail preservation, achieving coherent synthesis at ultra-high resolutions up to 36 megapixels across multiple models and target resolutions.

Abstract

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

Method Overview

  • How SEGA Works
    SEGA turns fixed attention scaling into dynamic, content-aware scaling by looking at the latent's frequency content during denoising.

Alternatives you considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions