Go | Mengboy 技术笔记

Go Dual-Provider LLM Routing (OpenAI + Claude): Timeout Tiers, Cost Caps, and Fallback Control

If your Go service relies on one LLM provider, two failures hurt the most, timeout spikes and billing spikes. A real production setup is not just “add another provider”, it is a single control plane for routing, timeout tiers, cost caps, and fallback. This guide gives you a practical OpenAI + Claude dual-provider pattern with one priority, keep uptime first, then optimize quality. ...

Go 服务双栈模型路由（OpenAI/Claude）：超时分层、成本上限与降级回退

线上接入单一模型供应商，最怕两件事，突发超时和账单失控。真正可落地的方案不是“多接一个模型”这么简单，而是把路由、超时、成本、回退放进同一个控制面。这篇给你一套 Go 可直接落地的双栈路由框架，目标是三件事，稳定性优先、成本可控、故障可快速止血。 ...

OpenAI Responses + Go Stream Recovery: Delta Persistence, Resume Tokens, and Duplicate Chunk Dedup

In production, the painful part is not “streaming is slow.” It’s “streaming breaks and then duplicates output after reconnect.” This guide gives you a practical recovery loop: delta persistence + resume token + idempotent dedup, so reconnection does not replay garbage. ...

OpenAI Responses + Go 的流式中断恢复：delta 持久化、resume token 与重复片段去重

生产里最难受的不是“流式返回慢”，而是“流式返回断了还重复”，用户看到半句、重连后又从中间重喷一遍。这篇给一套可落地的恢复闭环：delta 持久化 + resume token + 幂等去重，目标是“断线可续，重放不重字”。 ...

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift. Use a 3-layer memory design: L1: short-term conversational window (seconds) L2: rolling summary (minutes) L3: long-term retrieval memory (days) ...

Go + OpenAI Responses Agent 记忆分层实战：短期上下文、长期索引与成本封顶

你在 Go 里做 Agent，最容易翻车的不是推理能力，而是“记忆”失控：上下文越来越长、账单越来越高、回答却越来越飘。这篇给你一个可落地的三层方案： L1：短期会话上下文（秒级，强相关） L2：中期摘要记忆（分钟级，压缩） L3：长期检索记忆（天级，向量索引） ...

Go 服务调用 OpenAI 的 429/5xx 风暴应对：令牌桶、指数退避与熔断恢复

你不是被 OpenAI API「偶尔报错」打败的；你是被并发放大后的重试风暴打败的。 ...

Handling OpenAI 429/5xx Storms in Go: Token Bucket, Exponential Backoff, and Circuit Breakers

Most Go teams are not killed by a single API error. They are killed by a retry storm they created themselves. ...

OpenAI Batch API + Go 降本实战：离线拆批、失败重放与成本边界

一句话结论：如果你的调用是可延迟、可批处理、可回放，就该把在线请求下沉到 Batch API；省钱最明显，但前提是你把拆批、失败分流和回放链路先做好。很多团队把 Batch API 当“便宜版同步接口”来用，结果不是省钱，而是把失败样本堆成事故池。真正的保守做法是：先定义成本边界和SLO，再做离线拆批与失败回放。 ...

OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries

Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first. Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay. ...