Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift. Use a 3-layer memory design: L1: short-term conversational window (seconds) L2: rolling summary (minutes) L3: long-term retrieval memory (days) ...

March 18, 2026 · 3 min · mengboy

Go + OpenAI Responses Agent 记忆分层实战:短期上下文、长期索引与成本封顶

你在 Go 里做 Agent,最容易翻车的不是推理能力,而是“记忆”失控:上下文越来越长、账单越来越高、回答却越来越飘。 这篇给你一个可落地的三层方案: L1:短期会话上下文(秒级,强相关) L2:中期摘要记忆(分钟级,压缩) L3:长期检索记忆(天级,向量索引) ...

March 18, 2026 · 3 min · mengboy

Go 服务调用 OpenAI 的 429/5xx 风暴应对:令牌桶、指数退避与熔断恢复

你不是被 OpenAI API「偶尔报错」打败的;你是被并发放大后的重试风暴打败的。 ...

March 18, 2026 · 3 min · mengboy

Handling OpenAI 429/5xx Storms in Go: Token Bucket, Exponential Backoff, and Circuit Breakers

Most Go teams are not killed by a single API error. They are killed by a retry storm they created themselves. ...

March 18, 2026 · 3 min · mengboy

OpenAI Responses + GitHub Actions PR Risk Gate: Automated Evals, Tiered Blocking, and One-Click Rollback

You don’t need an AI reviewer that “sounds smart.” You need a gate that stops risky PRs before they hit main. This post shows a production-ready minimum setup: OpenAI Responses generates structured risk output, GitHub Actions enforces tiered policies, and critical failures can trigger a one-click rollback. ...

March 16, 2026 · 3 min · mengboy

OpenAI Responses + GitHub Actions 的 PR 风险闸门:自动评测、分级阻断与一键回滚

你不需要一个“会聊天”的 AI 审查器,你需要一个能阻断坏改动进主干的风险闸门。 这篇给一套可上线的最小方案:OpenAI Responses 负责生成结构化审查结论,GitHub Actions 负责分级阻断,发现高风险时自动回滚到安全提交。 ...

March 16, 2026 · 3 min · mengboy

OpenAI Batch API + Go 降本实战:离线拆批、失败重放与成本边界

一句话结论:如果你的调用是可延迟、可批处理、可回放,就该把在线请求下沉到 Batch API;省钱最明显,但前提是你把拆批、失败分流和回放链路先做好。 很多团队把 Batch API 当“便宜版同步接口”来用,结果不是省钱,而是把失败样本堆成事故池。真正的保守做法是:先定义成本边界和SLO,再做离线拆批与失败回放。 ...

March 13, 2026 · 3 min · mengboy

OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries

Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first. Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay. ...

March 13, 2026 · 3 min · mengboy

OpenAI Responses Structured Outputs + Go:Schema 演进、坏样本兜底与灰度回滚

Structured Outputs 最容易翻车的地方,不是“模型不听话”,而是你把 schema 当成了永远不变的圣旨。 线上一旦进入版本演进期,最常见的事故就是:字段新增后老消费端崩、枚举值扩展后校验误杀、坏样本把整条链路拖死,最后只能半夜回滚,像在给自己写惊悚片。 ...

March 11, 2026 · 4 min · mengboy

OpenAI Responses Structured Outputs with Go: Schema Evolution, Bad-Case Fallbacks, and Gradual Rollback

The hardest part of Structured Outputs is not getting JSON once. It is surviving schema changes without turning production into a small fire with excellent logs and terrible business results. Once a Go service starts evolving prompts and response contracts, the usual failure modes show up fast: a new required field breaks older consumers, an enum expands and strict validation kills valid requests, or one bad sample drags the whole chain into retries and rollback panic. ...

March 11, 2026 · 6 min · mengboy