Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

Wed, 18 Mar 2026 16:33:52 +0000

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift.

Use a 3-layer memory design:

L1: short-term conversational window (seconds)
L2: rolling summary (minutes)
L3: long-term retrieval memory (days)

OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries

Fri, 13 Mar 2026 01:08:00 +0000

Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first.

Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay.

Cost Control on Mengboy Tech Notes

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries