Production streaming fails in two predictable ways: users wait while the stream silently drops, and your logs say “timeout” without telling you where it actually broke.
This guide gives you a practical Go pattern for OpenAI Responses API streaming with strict timeout boundaries, safe retries, and useful telemetry.
1) Define retry boundaries first
Do not retry every error.
- Retryable: 429, 5xx, transient network resets, upstream gateway timeout
- Non-retryable: 401/403, invalid request payload, explicit context cancellation
- Conditional: mid-stream disconnects (depends on whether continuation is acceptable)
Retrying 4xx blindly is just automated self-harm.
2) Recommended timeout model in Go
Use three layers:
- Request context timeout (hard deadline, e.g. 45s)
- HTTP client timeout (connection guard, e.g. 50s)
- Stream idle timeout (abort when no token arrives for too long)
ctx, cancel := context.WithTimeout(r.Context(), 45*time.Second)
defer cancel()
httpClient := &http.Client{Timeout: 50 * time.Second}
Keep HTTP timeout slightly larger than context timeout.
3) Exponential backoff with jitter
Practical defaults:
- Initial backoff: 300-500ms
- Exponential growth with cap (e.g. 5s)
- Add jitter to avoid synchronized retry spikes
- Max attempts: 3-5
func backoff(attempt int) time.Duration {
base := 400 * time.Millisecond
max := 5 * time.Second
d := base * time.Duration(1<<attempt)
if d > max {
d = max
}
jitter := time.Duration(rand.Int63n(int64(d / 4)))
return d + jitter
}
4) Minimum observability signals
At least emit these metrics:
llm_stream_first_token_msllm_stream_total_duration_msllm_stream_tokens_inllm_stream_tokens_outllm_stream_retry_countllm_stream_error_total{code,type}
Attach trace_id per request so you can correlate request entry, model latency, and stream completion.
5) Retry-capable stream skeleton
func StreamWithRetry(ctx context.Context, req *Request) error {
var lastErr error
for attempt := 0; attempt < 4; attempt++ {
start := time.Now()
err := streamOnce(ctx, req)
recordMetrics(time.Since(start), attempt, err)
if err == nil {
return nil
}
if !isRetryable(err) {
return err
}
lastErr = err
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(backoff(attempt)):
}
}
return fmt.Errorf("stream failed after retries: %w", lastErr)
}
6) Common failure patterns
- Not canceling context when client disconnects (goroutine leaks)
- Buffering the whole answer before returning (not really streaming)
- Tracking failures only, not first-token latency
- Retrying without error-type filtering
7) Pre-launch validation checklist
- Inject 429 and verify backoff + retries
- Inject 5xx and verify final error observability
- Simulate short network drop and verify recovery
- Run concurrency test (50/100) and watch p95 first-token latency
Summary
Reliable Responses API streaming in Go is mostly engineering hygiene:
- Error boundaries
- Time boundaries
- Observability boundaries
Build these three first, then optimize prompt and model quality.