RAG Accuracy Playbook: Retrieval Recall, Re-Ranking, and Evaluation Loop

If your RAG system feels unreliable, switching to a more expensive LLM is usually the wrong first move. In most cases, the bottleneck is retrieval quality: weak recall, poor ranking, and no measurement loop. This guide gives a practical path: make recall broader, make ranking sharper, then close the loop with offline + online evaluation. ...

February 17, 2026 · 3 min · mengboy

OpenAI Responses API + MCP in Practice: From Function Calling to Agent Workflows

If you’ve already used function calling but keep writing glue code for every non-trivial task, you’re likely at the point where Responses API + MCP makes more sense. This guide is practical: how to move from single tool calls to a scalable agent workflow where retrieval, execution, validation, and write-back follow a consistent structure. ...

February 11, 2026 · 3 min · mengboy

Claude vs Codex vs OpenAI CLI: Which Workflow Actually Improves Dev Productivity

If you use AI as a chatbot only, these tools feel similar. In real engineering workflows, they behave very differently. My conclusion first: use Codex for repo-native coding changes, Claude for deep reasoning and long-form planning, and OpenAI CLI for standardized automation pipelines. ...

February 9, 2026 · 2 min · mengboy

Software Engineering History: From Software Crisis to AI Co-Creation

Large language models are changing how we clarify requirements, generate code, and design tests, and many teams feel that traditional workflows are being rewritten. To understand what is truly changing, it helps to place today inside the longer history of software engineering. This article walks through the major stages of software engineering and ends with the AI-era variables and a simple checklist so you can map your current problems to the right time scale. ...

December 31, 2025 · 3 min · mengboy