The most expensive outage is not a single failure — it is a failure amplified by retries.
In an OpenAI Responses + Go tool-calling stack, missing idempotency, jittered backoff, and breaker thresholds can turn 10 failing requests into 1000 downstream calls in minutes.
TL;DR: You need all three guardrails
- Idempotency key: one business action should apply once.
- Backoff + jitter: retries must spread out, not synchronize.
- Circuit breaker threshold: fail fast when error budget is blown.
How retry storms usually start
Common bad setup:
- HTTP timeout too short (for example, 3 seconds)
- Gateway retries 3 times + service retries 3 times
- No idempotency control in tool execution
- Fixed retry interval on all instances (no jitter)
What happens next:
- A tiny upstream hiccup is amplified 9x to 27x
- P95 latency spikes and queues pile up
- Alerts fan out across API errors, DB lock contention, and cache misses
Go implementation: idempotency keys
Recommended key format:
idem:{tenant}:{workflow}:{biz_id}:{step}
Rules:
- Build from business-unique fields (not random UUIDs)
- TTL must cover your max retry window (for example, 15 minutes)
- Store status, response hash, first-seen and last-updated timestamps
Redis example (SETNX + TTL):
ok, err := rdb.SetNX(ctx, idemKey, "PENDING", 15*time.Minute).Result()
if err != nil {
return err
}
if !ok {
// Existing execution found: return cached outcome
return ErrDuplicateSuppressed
}
Write a result summary after success:
_ = rdb.Set(ctx, idemKey, "DONE:tool_result_hash", 15*time.Minute).Err()
Go implementation: exponential backoff with full jitter
Wrong: fixed sleep(500ms).
Right: exponential backoff + full jitter:
func backoff(attempt int, base, cap time.Duration) time.Duration {
max := base << attempt
if max > cap {
max = cap
}
return time.Duration(rand.Int63n(int64(max)))
}
Conservative defaults:
base = 200mscap = 5smaxAttempts = 4- Retry only retryable classes (429/5xx/transient network errors)
Go implementation: breaker thresholds with error budget
Use a 30-second sliding window:
- requests >= 50
- error rate >= 25%
- trigger in 2 consecutive windows → open for 20 seconds
Pseudo code:
if window.Req >= 50 && window.ErrRate() >= 0.25 {
breaker.Trip(20 * time.Second)
}
if breaker.Open() {
return ErrFastFail
}
Fallback policy when open:
- Return cached summary or last known good result
- Skip non-critical tools
- Tell users output may be partial
Metrics you must ship
At minimum:
tool_call_total{tool,status}retry_total{reason}idempotency_suppressed_totalbreaker_open_totalllm_latency_ms_p95cost_usd_total
Alert ideas:
retry_total> 3x baseline in 5 minutes- sudden jump in
idempotency_suppressed_total - sustained
breaker_open_total > 0
Troubleshooting checklist
- Check 429/5xx ratio in the last 15 minutes.
- Confirm you do not have double retry layers.
- Sample failing requests and verify key stability.
- Verify retries are jittered, not fixed sleep.
- Check breaker open/half-open recovery behavior.
- Reconcile duplicate writes or duplicate charges.
Summary
Retries are not free.
In production Responses + Go pipelines, idempotency first, jittered retries second, circuit breaker third is the practical order that turns a potential avalanche into controlled degradation.
If you can do only one thing today: add idempotency keys first. It usually delivers the highest ROI immediately.