docs: streaming LLM responses implementation plan (issue #71) by TumCucTom · Pull Request #83 · MiniMax-AI/Mini-Agent

TumCucTom · 2026-04-04T23:09:32Z

Summary

Implementation plan for issue #71 — streaming output for LLM responses.

This is a design document only — no code changes. The PR contains docs/STREAMING_PLAN.md which covers:

API design — new StreamChunk schema, generate_stream() abstract method
Implementation — OpenAI and Anthropic streaming clients, partial tool call buffering
Agent loop changes — streaming-aware run(), cancellation support
Rendering design — thinking vs content rendering rules, ANSI terminal output
Compatibility — streaming is opt-out via --no-stream flag

Streaming is the default UX. Non-streaming is available via --no-stream for scripted/power-user cases.