Cost Control on Mengboy 技术笔记

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

Wed, 18 Mar 2026 16:33:52 +0000

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift.

Use a 3-layer memory design:

L1: short-term conversational window (seconds)
L2: rolling summary (minutes)
L3: long-term retrieval memory (days)

Go + OpenAI Responses Agent 记忆分层实战：短期上下文、长期索引与成本封顶

Wed, 18 Mar 2026 16:33:52 +0000

你在 Go 里做 Agent，最容易翻车的不是推理能力，而是“记忆”失控：上下文越来越长、账单越来越高、回答却越来越飘。

这篇给你一个可落地的三层方案：

L1：短期会话上下文（秒级，强相关）
L2：中期摘要记忆（分钟级，压缩）
L3：长期检索记忆（天级，向量索引）

OpenAI Batch API + Go 降本实战：离线拆批、失败重放与成本边界

Fri, 13 Mar 2026 01:08:00 +0000

一句话结论：如果你的调用是可延迟、可批处理、可回放，就该把在线请求下沉到 Batch API；省钱最明显，但前提是你把拆批、失败分流和回放链路先做好。

很多团队把 Batch API 当“便宜版同步接口”来用，结果不是省钱，而是把失败样本堆成事故池。真正的保守做法是：先定义成本边界和SLO，再做离线拆批与失败回放。

OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries

Fri, 13 Mar 2026 01:08:00 +0000

Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first.

Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay.