OpenAI Responses API Streaming in Go: Timeouts, Retries, and Observability

Production streaming fails in two predictable ways: users wait while the stream silently drops, and your logs say “timeout” without telling you where it actually broke.

This guide gives you a practical Go pattern for OpenAI Responses API streaming with strict timeout boundaries, safe retries, and useful telemetry.

1) Define retry boundaries first

Do not retry every error.

Retryable: 429, 5xx, transient network resets, upstream gateway timeout
Non-retryable: 401/403, invalid request payload, explicit context cancellation
Conditional: mid-stream disconnects (depends on whether continuation is acceptable)

Retrying 4xx blindly is just automated self-harm.

2) Recommended timeout model in Go

Use three layers:

Request context timeout (hard deadline, e.g. 45s)
HTTP client timeout (connection guard, e.g. 50s)
Stream idle timeout (abort when no token arrives for too long)

ctx, cancel := context.WithTimeout(r.Context(), 45*time.Second)
defer cancel()

httpClient := &http.Client{Timeout: 50 * time.Second}

Keep HTTP timeout slightly larger than context timeout.

3) Exponential backoff with jitter

Practical defaults:

Initial backoff: 300-500ms
Exponential growth with cap (e.g. 5s)
Add jitter to avoid synchronized retry spikes
Max attempts: 3-5

func backoff(attempt int) time.Duration {
    base := 400 * time.Millisecond
    max := 5 * time.Second
    d := base * time.Duration(1<<attempt)
    if d > max {
        d = max
    }
    jitter := time.Duration(rand.Int63n(int64(d / 4)))
    return d + jitter
}

4) Minimum observability signals

At least emit these metrics:

llm_stream_first_token_ms
llm_stream_total_duration_ms
llm_stream_tokens_in
llm_stream_tokens_out
llm_stream_retry_count
llm_stream_error_total{code,type}

Attach trace_id per request so you can correlate request entry, model latency, and stream completion.

5) Retry-capable stream skeleton

func StreamWithRetry(ctx context.Context, req *Request) error {
    var lastErr error
    for attempt := 0; attempt < 4; attempt++ {
        start := time.Now()
        err := streamOnce(ctx, req)
        recordMetrics(time.Since(start), attempt, err)

        if err == nil {
            return nil
        }
        if !isRetryable(err) {
            return err
        }

        lastErr = err
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(backoff(attempt)):
        }
    }
    return fmt.Errorf("stream failed after retries: %w", lastErr)
}

6) Common failure patterns

Not canceling context when client disconnects (goroutine leaks)
Buffering the whole answer before returning (not really streaming)
Tracking failures only, not first-token latency
Retrying without error-type filtering

7) Pre-launch validation checklist

Inject 429 and verify backoff + retries
Inject 5xx and verify final error observability
Simulate short network drop and verify recovery
Run concurrency test (50/100) and watch p95 first-token latency

Summary

Reliable Responses API streaming in Go is mostly engineering hygiene:

Error boundaries
Time boundaries
Observability boundaries

Build these three first, then optimize prompt and model quality.

1) Define retry boundaries first#

2) Recommended timeout model in Go#

3) Exponential backoff with jitter#

4) Minimum observability signals#

5) Retry-capable stream skeleton#

6) Common failure patterns#

7) Pre-launch validation checklist#

Summary#