Chat models on the Huzz API can stream their output token by token instead of returning one final payload. Set stream: true on a chat completion request and the response arrives as server-sent events (SSE) — the same wire format the OpenAI SDKs already understand. Streaming makes interfaces feel instant: users see the first words in a few hundred milliseconds instead of waiting for the whole answer.

Make a streaming request

curl -N https://api.huzz.ai/v1/chat/completions \
  -H "Authorization: Bearer $HUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about shipping fast."}]
  }'
The -N flag on cURL disables buffering so events print as they arrive.

What comes over the wire

Each event is a data: line containing a chat completion chunk. Incremental text lives in choices[0].delta.content; the stream ends with a literal data: [DONE] sentinel:
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Ship"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" it"}}]}

data: [DONE]
The OpenAI SDKs handle parsing, reconnect-safe reads, and the [DONE] sentinel for you — the raw format only matters if you consume the stream with your own HTTP client.

Tips

  • Check the model page first. Each entry in the catalog shows whether the model streams. Most chat models do; image, video, and audio models return results synchronously or via async predictions instead.
  • Flush as you render. Append each delta to your UI as it arrives rather than accumulating the whole message.
  • Handle mid-stream errors. If the connection drops before [DONE], treat the response as incomplete and retry the request.