Skip to content

fix: guardrail redact targets last user message, not trailing LTM context#1884

Closed
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message
Closed

fix: guardrail redact targets last user message, not trailing LTM context#1884
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message

Conversation

@giulio-leone
Copy link
Copy Markdown
Contributor

Issue

Closes #1639

Problem

When guardrail redaction is enabled (guardrail_redact_input=True) together with a long-term memory (LTM) session manager like AgentCoreMemorySessionManager, the redact logic incorrectly modifies the LTM context message instead of the user's input.

The LTM session manager appends an assistant message after the user turn:

messages[0]: {role: 'user',      content: [{text: 'Tell me something bad'}]}      ← should be redacted
messages[1]: {role: 'assistant', content: [{text: '<user_context>...</user_context>'}]}  ← was being redacted

The redact handler used self.messages[-1], which blindly picked the last message regardless of role.

Root Cause

In agent.py, the guardrail redaction code assumed self.messages[-1] is always the user's input:

self.messages[-1]['content'] = self._redact_user_content(
    self.messages[-1]['content'], ...
)

With LTM enabled, messages[-1] is the assistant's context message, not the user's input.

Solution

Replaced self.messages[-1] with a reverse search for the last message with role == 'user':

last_user_msg = next(
    (m for m in reversed(self.messages) if m['role'] == 'user'),
    None,
)

This matches the pattern already used by _find_last_user_text_message_index() in the Bedrock model for guardrail_latest_message wrapping.

Testing

  • Added test_agent_redacts_user_message_not_ltm_context: Simulates the LTM scenario with a trailing assistant context message, verifies the user message is redacted and the LTM context is preserved
  • All 8 guardrail-related tests pass
  • All 113 agent tests pass

Changes

  • src/strands/agent/agent.py: Changed guardrail redact handler to find last user-role message
  • tests/strands/agent/test_agent.py: Added test for LTM + guardrail interaction

@giulio-leone
Copy link
Copy Markdown
Contributor Author

Friendly ping — fixes guardrail redaction to target the actual last user message instead of trailing long-term memory context, which was causing false positive redactions.

When long-term memory (LTM) session managers like
AgentCoreMemorySessionManager append an assistant message containing
user context after the user turn, the guardrail redaction logic
incorrectly redacted the LTM context instead of the actual user input.

Root cause: the redact handler used `self.messages[-1]` which assumes
the last message is the user's input.  With LTM enabled, the message
list looks like:

  [0] user: 'Tell me something bad'       ← should be redacted
  [1] assistant: '<user_context>...</user_context>'  ← was being redacted

The fix replaces `self.messages[-1]` with a reverse search for the
last message with `role == 'user'`, matching the pattern already used
by `_find_last_user_text_message_index()` in the Bedrock model for
guardrail_latest_message wrapping.

Closes #1639
@giulio-leone giulio-leone force-pushed the fix/guardrail-redact-last-user-message branch from ce2e12f to 1fb7549 Compare March 23, 2026 06:06
@github-actions github-actions bot added size/s and removed size/s labels Mar 23, 2026
@giulio-leone
Copy link
Copy Markdown
Contributor Author

Refreshed onto main @ fd8168a (v1.32.0+2) — 2026-03-23

Root cause confirmed still live: The guardrail redaction path does self.messages[-1] to find the message to redact. When a session manager (e.g. AgentCoreMemorySessionManager) appends long-term memory context as an additional message after the user turn, messages[-1] points to that LTM context message rather than the actual user input — causing the guardrail to redact the wrong content and leaving the real user message untouched.

Fix: Replace self.messages[-1] with a next(m for m in reversed(self.messages) if m["role"] == "user", None) scan that finds the last user-role message regardless of trailing non-user messages appended by session managers.

Runtime proof on rebased branch 1fb7549:

  • test_agent_redacts_user_message_not_ltm_context: PASSED (the critical regression case)
  • All 8 guardrail/redact tests: 8/8 PASSED
  • Full agent test suite: 113/113 PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] guardrail_redact_input override ltm_msg instead of the last user message

1 participant