OpenAI Responses Streaming in Production: Backpressure, Chunk Reassembly, and Timeout Budget

Most streaming failures are not about “can it stream”, but “does it stay stable under load”: broken chunks, stuck clients, timeout cascades, and retry storms. ...

March 27, 2026 · 2 min · mengboy

OpenAI Responses + Go Stream Recovery: Delta Persistence, Resume Tokens, and Duplicate Chunk Dedup

In production, the painful part is not “streaming is slow.” It’s “streaming breaks and then duplicates output after reconnect.” This guide gives you a practical recovery loop: delta persistence + resume token + idempotent dedup, so reconnection does not replay garbage. ...

March 23, 2026 · 4 min · mengboy

OpenAI Responses in Go Multi-Tenant Quota Governance: Token Buckets, Budget Circuit Breakers, and Cost Attribution

Most multi-tenant AI platforms fail for two boring reasons: one tenant saturates shared capacity, and finance discovers the burn too late. This guide gives you a practical Go blueprint: token-bucket throttling, budget circuit breakers, and request-level cost attribution. ...

March 20, 2026 · 4 min · mengboy

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift. Use a 3-layer memory design: L1: short-term conversational window (seconds) L2: rolling summary (minutes) L3: long-term retrieval memory (days) ...

March 18, 2026 · 3 min · mengboy

OpenAI Responses + GitHub Actions PR Risk Gate: Automated Evals, Tiered Blocking, and One-Click Rollback

You don’t need an AI reviewer that “sounds smart.” You need a gate that stops risky PRs before they hit main. This post shows a production-ready minimum setup: OpenAI Responses generates structured risk output, GitHub Actions enforces tiered policies, and critical failures can trigger a one-click rollback. ...

March 16, 2026 · 3 min · mengboy