Skip to content

performance: further performance optimizations for large documents#637

Merged
JackByrne merged 9 commits into
elapouya:devfrom
start-software:develop
May 18, 2026
Merged

performance: further performance optimizations for large documents#637
JackByrne merged 9 commits into
elapouya:devfrom
start-software:develop

Conversation

@JackByrne
Copy link
Copy Markdown
Collaborator

Summary

This pull request refactors and optimizes the docxtpl/template.py module with a strong focus on rendering performance.

The primary motivation for these changes was improving rendering performance for large documents containing complex tables. In a real-world case, a document containing a table with 2,164 rows previously required approximately one hour to render. With these optimizations applied, rendering time was reduced to under 10 seconds.


Key Improvements

Performance Optimizations

  • Pre-compiled regular expressions

    • Regular expressions used for XML tag stripping and Jinja syntax detection are now pre-compiled and reused.
    • This avoids repeated compilation overhead during rendering and improves parsing efficiency.
  • Early exit in resolve_listing()

    • Added a fast-path return when no special characters are present.
    • Avoids unnecessary processing in the most common rendering scenarios.
  • Optimized XML tree manipulation

    • Refactored map_tree() to replace the entire <w:body> element in a single operation (O(1) complexity).
    • Includes a fallback to the previous per-child replacement logic for malformed templates or edge cases.
    • This significantly improves performance for large documents and large table structures.

Conditional Header/Footer Rendering

The rendering pipeline now skips header and footer processing when no Jinja tags are detected.

Changes include:

  • Pre-compiled Jinja detection patterns for faster checks.
  • Conditional rendering logic to avoid unnecessary processing of static headers and footers.
  • Safe fallback to the existing rendering behavior if detection or rendering fails.

This reduces overhead for documents where headers and footers are static.


Real-World Performance Impact

Scenario Before After
Large document with 2,164-row table ~1 hour <10 seconds

These changes provide substantial improvements for large and complex templates while maintaining compatibility with existing rendering behavior.

bonggo-pras and others added 9 commits May 12, 2026 16:15
Delete the module-level logger and several logger.warning calls in docxtpl/template.py. Added while debugging and should be removed.
Improve documentation in map_tree to explain the optimization: the code swaps the entire <w:body> via root.remove() + root.insert() to avoid O(n) per-child lxml operations, which is effectively O(1) on the document root. Clarify that the body's index is preserved so element order (body before sectPr) remains intact, and spell out the fallback behavior (child-by-child copy) if the body isn't a direct child or if remove/insert fails. Add additional safety and explanatory comments.
Enhance header/footer processing by detecting Jinja tags split across Word XML runs: check both intact tags (_JINJA_PATTERN) and open-tag fragments (_RE_JINJA_OPEN) when scanning part XML. Use a generator to iterate part XML strings once, and keep the existing exception fallback to unconditionally render headers/footers if the fast-path check fails (e.g. malformed XML). Also add clarifying comments about properties and footnotes skipping behaviour and make minor comment style fixes.
Add a fast-path to DocxTemplate.resolve_listing that returns the input XML unchanged when no Listing special characters are present. The check looks for tab, newline, bell and form-feed ("\t", "\n", "\a", "\f") and avoids running the heavier resolution logic in the common case, improving performance without changing behavior.
Introduce pre-compiled regex patterns (_RE_TAG_STRIP and _RE_COMMENT_STRIP) to strip surrounding <w:y> tags from template tags like {%y ...%}, {{y ...}} and comments {#y ...#}. Replace repeated re.sub loops with iteration over these patterns to avoid recompiling the same regexes on every call, reduce code duplication, and improve performance/maintainability.
Clean up docxtpl/template.py by removing unused imports: functools, logging, and Template from jinja2. Keeps Environment and meta from jinja2 and does not change runtime behavior; this reduces linter warnings and unnecessary dependencies.
Update comment in docxtpl/template.py to clarify the fallback behavior when processing headers and footers. The comment now explains the fallback guards against unexpected part structure (e.g. blob is None or missing attributes) rather than implying it handles malformed XML; malformed XML would still fail in build_headers_footers_xml. No functional change.
performance: further performance optimizations for large documents
@JackByrne JackByrne self-assigned this May 18, 2026
@JackByrne JackByrne merged commit 10079b1 into elapouya:dev May 18, 2026
0 of 5 checks passed
@elapouya
Copy link
Copy Markdown
Owner

Hello Jack, thank you for your Merge. Be Careful, Code Styling tests failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants