RAG on Mengboy Tech Notes

Go + OpenAI Responses Agent Memory Layering: Short-Term Context, Long-Term Index, and Cost Caps

Wed, 18 Mar 2026 16:33:52 +0000

In production Go agents, the first thing that breaks is usually not model quality. It is memory management: context grows, bills spike, and answers drift.

Use a 3-layer memory design:

L1: short-term conversational window (seconds)
L2: rolling summary (minutes)
L3: long-term retrieval memory (days)

RAG Accuracy Playbook: Retrieval Recall, Re-Ranking, and Evaluation Loop

Tue, 17 Feb 2026 10:56:00 +0800

If your RAG system feels unreliable, switching to a more expensive LLM is usually the wrong first move. In most cases, the bottleneck is retrieval quality: weak recall, poor ranking, and no measurement loop.

This guide gives a practical path: make recall broader, make ranking sharper, then close the loop with offline + online evaluation.