Streaming - Huzz API

Chat models on the Huzz API can stream their output token by token instead of returning one final payload. Set stream: true on a chat completion request and the response arrives as server-sent events (SSE) — the same wire format the OpenAI SDKs already understand. Streaming makes interfaces feel instant: users see the first words in a few hundred milliseconds instead of waiting for the whole answer.

Make a streaming request

curl -N https://api.huzz.ai/v1/chat/completions \
  -H "Authorization: Bearer $HUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about shipping fast."}]
  }'

The -N flag on cURL disables buffering so events print as they arrive.

What comes over the wire

Each event is a data: line containing a chat completion chunk. Incremental text lives in choices[0].delta.content; the stream ends with a literal data: [DONE] sentinel:

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Ship"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" it"}}]}

data: [DONE]

The OpenAI SDKs handle parsing, reconnect-safe reads, and the [DONE] sentinel for you — the raw format only matters if you consume the stream with your own HTTP client.

Tips

Check the model page first. Each entry in the catalog shows whether the model streams. Most chat models do; image, video, and audio models return results synchronously or via async predictions instead.
Flush as you render. Append each delta to your UI as it arrives rather than accumulating the whole message.
Handle mid-stream errors. If the connection drops before [DONE], treat the response as incomplete and retry the request.

​Make a streaming request

​What comes over the wire

​Tips

Make a streaming request

What comes over the wire

Tips