fix: guardrail redact targets last user message, not trailing LTM context#1884
fix: guardrail redact targets last user message, not trailing LTM context#1884giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
Conversation
44b6bb3 to
ce2e12f
Compare
|
Friendly ping — fixes guardrail redaction to target the actual last user message instead of trailing long-term memory context, which was causing false positive redactions. |
When long-term memory (LTM) session managers like AgentCoreMemorySessionManager append an assistant message containing user context after the user turn, the guardrail redaction logic incorrectly redacted the LTM context instead of the actual user input. Root cause: the redact handler used `self.messages[-1]` which assumes the last message is the user's input. With LTM enabled, the message list looks like: [0] user: 'Tell me something bad' ← should be redacted [1] assistant: '<user_context>...</user_context>' ← was being redacted The fix replaces `self.messages[-1]` with a reverse search for the last message with `role == 'user'`, matching the pattern already used by `_find_last_user_text_message_index()` in the Bedrock model for guardrail_latest_message wrapping. Closes #1639
ce2e12f to
1fb7549
Compare
|
Refreshed onto Root cause confirmed still live: The guardrail redaction path does Fix: Replace Runtime proof on rebased branch
|
Issue
Closes #1639
Problem
When guardrail redaction is enabled (
guardrail_redact_input=True) together with a long-term memory (LTM) session manager likeAgentCoreMemorySessionManager, the redact logic incorrectly modifies the LTM context message instead of the user's input.The LTM session manager appends an assistant message after the user turn:
The redact handler used
self.messages[-1], which blindly picked the last message regardless of role.Root Cause
In
agent.py, the guardrail redaction code assumedself.messages[-1]is always the user's input:With LTM enabled,
messages[-1]is the assistant's context message, not the user's input.Solution
Replaced
self.messages[-1]with a reverse search for the last message withrole == 'user':This matches the pattern already used by
_find_last_user_text_message_index()in the Bedrock model forguardrail_latest_messagewrapping.Testing
test_agent_redacts_user_message_not_ltm_context: Simulates the LTM scenario with a trailing assistant context message, verifies the user message is redacted and the LTM context is preservedChanges
src/strands/agent/agent.py: Changed guardrail redact handler to find last user-role messagetests/strands/agent/test_agent.py: Added test for LTM + guardrail interaction