Summary
Calling chat({ stream: false }) does not actually send stream: false to the provider. Internally it calls runStreamingText and concatenates the SSE stream into a string via streamToText. The wire request still has Accept: text/event-stream and "stream": true in the body. The only code path that sends a wire-level non-streaming request is chat({ outputSchema }) → runAgenticStructuredOutput → adapter.structuredOutput.
Reproduction
import { chat } from '@tanstack/ai'
import { openRouterText } from '@tanstack/ai-openrouter'
const adapter = openRouterText('x-ai/grok-4.3', { /* … */ })
const text = await chat({
adapter,
messages: [{ role: 'user', content: 'hello' }],
stream: false,
modelOptions: {
responseFormat: { type: 'json_schema', jsonSchema: { /* … */ } },
},
})
Wire capture: Accept: text/event-stream, body "stream": true. OpenRouter responds with SSE, SDK concatenates.
Root cause (current main)
packages/typescript/ai/src/activities/chat/index.ts:1609-1631 (dispatch):
if (outputSchema) return runAgenticStructuredOutput(options)
if (stream === false) return runNonStreamingText(options)
return runStreamingText(options)
packages/typescript/ai/src/activities/chat/index.ts:1666-1675 (the offender):
function runNonStreamingText(options): Promise<string> {
const stream = runStreamingText(options)
return streamToText(stream)
}
runStreamingText → TextEngine.streamModelResponse → adapter.chatStream(...). The OpenRouter adapter's chatStream hardcodes stream: true (packages/typescript/ai-openrouter/src/adapters/text.ts:131). Its structuredOutput is the only place that sends stream: false (packages/typescript/ai-openrouter/src/adapters/text.ts:214).
The adapter interface (packages/typescript/ai/src/activities/chat/adapter.ts:59-120) only defines chatStream, structuredOutput, and optional structuredOutputStream — there is no non-streaming chat() method on adapters today.
Why it matters
Reasoning models under concurrent load (Grok 4.3 via OpenRouter, in our case) can take 30s+ of pure reasoning before any content emission. Observed wire behavior with the same prompt × 6 parallel calls:
- streaming, no proxy: 25–52s wall-clock each, all clean
- non-streaming (true wire-level), no proxy: 22–41s wall-clock each, all clean
- streaming through a proxy with a 30s socket idle timeout: one of the six truncates mid-stream because OpenRouter sends nothing for >30s during the reasoning phase
If chat({ stream: false }) actually sent stream: false, that proxy-idle-timeout class of bug would not apply and fixtures/replay paths would be a single JSON body.
Proposed fix
- Add a non-streaming method to the adapter interface alongside
chatStream / structuredOutput:
chat(options: ChatStreamOptions): Promise<{
content: string
reasoning?: string
toolCalls?: …
usage?: …
}>
- OpenRouter implements it as a single
this.client.chat.send({ chatRequest: { …, stream: false } }) returning result.choices[0].message.content — mirroring structuredOutput minus the schema enforcement.
- Rewire
runNonStreamingText to call adapter.chat(...) directly instead of runStreamingText + streamToText.
- For adapters that don't implement
chat(), fall back to current behavior and emit a one-time warning so users know they're not getting wire-level non-streaming.
Happy to send a PR if the shape sounds right.
Environment
@tanstack/ai: 0.14.0
@tanstack/ai-openrouter: 0.8.2
- Node 24.x (Bun 1.x)
- Provider:
x-ai/grok-4.3 via OpenRouter
Summary
Calling
chat({ stream: false })does not actually sendstream: falseto the provider. Internally it callsrunStreamingTextand concatenates the SSE stream into a string viastreamToText. The wire request still hasAccept: text/event-streamand"stream": truein the body. The only code path that sends a wire-level non-streaming request ischat({ outputSchema })→runAgenticStructuredOutput→adapter.structuredOutput.Reproduction
Wire capture:
Accept: text/event-stream, body"stream": true. OpenRouter responds with SSE, SDK concatenates.Root cause (current
main)packages/typescript/ai/src/activities/chat/index.ts:1609-1631(dispatch):packages/typescript/ai/src/activities/chat/index.ts:1666-1675(the offender):runStreamingText→TextEngine.streamModelResponse→adapter.chatStream(...). The OpenRouter adapter'schatStreamhardcodesstream: true(packages/typescript/ai-openrouter/src/adapters/text.ts:131). ItsstructuredOutputis the only place that sendsstream: false(packages/typescript/ai-openrouter/src/adapters/text.ts:214).The adapter interface (
packages/typescript/ai/src/activities/chat/adapter.ts:59-120) only defineschatStream,structuredOutput, and optionalstructuredOutputStream— there is no non-streamingchat()method on adapters today.Why it matters
Reasoning models under concurrent load (Grok 4.3 via OpenRouter, in our case) can take 30s+ of pure reasoning before any content emission. Observed wire behavior with the same prompt × 6 parallel calls:
If
chat({ stream: false })actually sentstream: false, that proxy-idle-timeout class of bug would not apply and fixtures/replay paths would be a single JSON body.Proposed fix
chatStream/structuredOutput:this.client.chat.send({ chatRequest: { …, stream: false } })returningresult.choices[0].message.content— mirroringstructuredOutputminus the schema enforcement.runNonStreamingTextto calladapter.chat(...)directly instead ofrunStreamingText+streamToText.chat(), fall back to current behavior and emit a one-time warning so users know they're not getting wire-level non-streaming.Happy to send a PR if the shape sounds right.
Environment
@tanstack/ai: 0.14.0@tanstack/ai-openrouter: 0.8.2x-ai/grok-4.3via OpenRouter