stream: true on a chat completion request and the response arrives as server-sent events (SSE) — the same wire format the OpenAI SDKs already understand.
Streaming makes interfaces feel instant: users see the first words in a few hundred milliseconds instead of waiting for the whole answer.
Make a streaming request
-N flag on cURL disables buffering so events print as they arrive.
What comes over the wire
Each event is adata: line containing a chat completion chunk. Incremental text lives in choices[0].delta.content; the stream ends with a literal data: [DONE] sentinel:
[DONE] sentinel for you — the raw format only matters if you consume the stream with your own HTTP client.
Tips
- Check the model page first. Each entry in the catalog shows whether the model streams. Most chat models do; image, video, and audio models return results synchronously or via async predictions instead.
- Flush as you render. Append each delta to your UI as it arrives rather than accumulating the whole message.
- Handle mid-stream errors. If the connection drops before
[DONE], treat the response as incomplete and retry the request.