RAG on Mengboy 技术笔记

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

Wed, 18 Mar 2026 16:33:52 +0000

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift.

Use a 3-layer memory design:

L1: short-term conversational window (seconds)
L2: rolling summary (minutes)
L3: long-term retrieval memory (days)

Go + OpenAI Responses Agent 记忆分层实战：短期上下文、长期索引与成本封顶

Wed, 18 Mar 2026 16:33:52 +0000

你在 Go 里做 Agent，最容易翻车的不是推理能力，而是“记忆”失控：上下文越来越长、账单越来越高、回答却越来越飘。

这篇给你一个可落地的三层方案：

L1：短期会话上下文（秒级，强相关）
L2：中期摘要记忆（分钟级，压缩）
L3：长期检索记忆（天级，向量索引）

RAG Accuracy Playbook: Retrieval Recall, Re-Ranking, and Evaluation Loop

Tue, 17 Feb 2026 10:56:00 +0800

If your RAG system feels unreliable, switching to a more expensive LLM is usually the wrong first move. In most cases, the bottleneck is retrieval quality: weak recall, poor ranking, and no measurement loop.

This guide gives a practical path: make recall broader, make ranking sharper, then close the loop with offline + online evaluation.

RAG 不准怎么办：检索召回、重排与评估闭环落地指南

Tue, 17 Feb 2026 10:56:00 +0800

很多团队做 RAG 的第一反应是“把 embedding 换成更贵的模型”，结果成本上去了，效果却不稳定。真正的问题通常不在生成，而在检索链路：召回不全、排序不准、评估缺失。

这篇给一套可直接落地的做法：先把召回做厚，再把重排做准，最后用离线 + 在线指标形成持续优化闭环。