OpenAI Assistants/Responses 在 Go 里的上下文爆炸治理:截断策略、摘要回填与成本上限

线上 Agent 一跑久了就会遇到同一个坑:上下文越来越长,延迟飙升、费用失控,最后还更容易答偏。 这不是模型“变笨”了,通常是上下文治理没做:该留的没留、该删的没删、该摘要的摘要坏了。 ...

March 2, 2026 · 2 min · mengboy

Taming Context Explosion in OpenAI Assistants/Responses with Go: Truncation, Summary Backfill, and Cost Caps

Long-running agent sessions usually fail the same way: context keeps growing, latency spikes, costs blow up, and answer quality gets worse. That is rarely a model-quality issue. It is almost always missing context governance. ...

March 2, 2026 · 2 min · mengboy

Go + OpenAI API Timeout Troubleshooting: DNS, TLS, Proxy, and Connection Pool

When OpenAI API calls start timing out in production, the real problem is usually not “OpenAI is down.” The real problem is you don’t know which hop is failing: DNS, TLS handshake, proxy path, or your own connection pool. ...

March 2, 2026 · 2 min · mengboy

Go 调 OpenAI API 常见超时链路排查:DNS/TLS/代理/连接池一次讲清

线上调用 OpenAI API 一旦出现超时,最烦的不是“偶发失败”,而是你不知道到底卡在 DNS、TLS、代理,还是你自己的连接池。 这篇给你一套可直接落地的排查顺序:先判定超时发生在哪一段,再用指标和最小实验定位,最后给可复制的 Go 配置模板,避免同类事故反复出现。 ...

March 2, 2026 · 2 min · mengboy

OpenAI Agents SDK + Go 落地指南:Tool Calling、会话记忆与错误恢复

很多团队已经把 LLM 接进业务,但一到“多步任务 + 调工具 + 失败重试”就开始失控:日志看不懂、状态回不去、成本还飙升。 这篇给你一个能直接落地到 Go 服务里的最小可用方案:工具调用闭环、会话记忆分层、错误恢复可回放。 ...

February 25, 2026 · 2 min · mengboy

OpenAI Agents SDK with Go: Tool Calling, Session Memory, and Error Recovery

Most teams can connect an LLM in a demo. The real pain starts in production: multi-step tasks, flaky tool calls, unclear retries, and rising cost. This guide gives you a pragmatic Go-first blueprint for shipping an Agent workflow that can survive real incidents. ...

February 25, 2026 · 3 min · mengboy

OpenAI Responses API Streaming in Go: Timeouts, Retries, and Observability

Production streaming fails in two predictable ways: users wait while the stream silently drops, and your logs say “timeout” without telling you where it actually broke. This guide gives you a practical Go pattern for OpenAI Responses API streaming with strict timeout boundaries, safe retries, and useful telemetry. ...

February 23, 2026 · 2 min · mengboy

OpenAI Responses API 流式输出在 Go 中的工程化实践:超时、重试与可观测性

线上流式生成最怕两件事:用户在等,你的连接先断;日志里报错一堆,你却不知道是哪一层炸了。 这篇给你一个能直接落地的 Go 工程模板:把 OpenAI Responses API 的流式调用做成可超时、可重试、可观测的生产级链路。 ...

February 23, 2026 · 2 min · mengboy

Go Memory Leak Triage in Production: pprof + FlameGraph Step by Step

If your Go service RSS keeps climbing, drops after restart, then climbs again, you likely have a memory retention problem (or an actual leak pattern). Do not start with random code edits. Run a clean evidence chain: metrics trend check → pprof snapshots → FlameGraph comparison → object growth path → regression validation. ...

February 14, 2026 · 3 min · mengboy

Go 服务内存泄漏定位实战:pprof + FlameGraph 一次找准

线上 Go 服务 RSS 一路涨,重启后短暂恢复,过几小时继续涨——这就是典型“疑似内存泄漏”场景。 别先拍脑袋改代码,先把证据链跑通:监控确认 → pprof 采样 → FlameGraph 对比 → 定位对象增长路径 → 回归验证。这套流程跑完,基本能把“玄学泄漏”打成“可复现 bug”。 ...

February 14, 2026 · 3 min · mengboy