Most Go teams are not killed by a single API error. They are killed by a retry storm they created themselves.
TL;DR
When OpenAI starts returning 429 and 5xx under pressure, the stable pattern is:
- Rate-limit at ingress with a token bucket.
- Retry only recoverable failures with exponential backoff + jitter.
- Trip a circuit breaker on sustained failure windows.
- Use request budgets + idempotency keys to prevent runaway retries and duplicate side effects.
- Track the right metrics (429 ratio, retry depth, breaker state).
Why simple retries fail
A common production loop looks like this:
- traffic spike raises concurrency
- upstream rate limit returns 429
- app retries immediately (without jitter)
- retries multiply load
- 5xx increases
- even more retries
That is a positive feedback loop, not resilience.
1) Token bucket first: control concurrency before anything else
import "golang.org/x/time/rate"
var limiter = rate.NewLimiter(rate.Limit(8), 16) // 8 rps, burst 16
func allow(ctx context.Context) error {
return limiter.Wait(ctx)
}
Practical baseline:
- online traffic: per-tenant + global limit
- batch jobs: isolated queue and quota (never steal online quota)
2) Backoff with jitter: retry only what is retryable
func backoff(attempt int) time.Duration {
base := 200 * time.Millisecond
max := 5 * time.Second
d := base * time.Duration(1<<attempt)
if d > max { d = max }
jitter := time.Duration(rand.Int63n(int64(d / 2)))
return d/2 + jitter
}
func retryable(status int) bool {
if status == 429 { return true }
if status >= 500 && status <= 599 { return true }
return false
}
Rules that save real systems:
- don’t retry most 4xx
- respect
Retry-Afterwhen present - cap retries to 2–3 attempts per request
3) Circuit breaker: fail fast when failure ratio is sustained
Using sony/gobreaker:
cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "openai-responses",
Interval: 30 * time.Second,
Timeout: 20 * time.Second,
ReadyToTrip: func(c gobreaker.Counts) bool {
if c.Requests < 20 { return false }
return float64(c.TotalFailures)/float64(c.Requests) >= 0.5
},
})
Open state protects both upstream and your own workers from meltdown.
4) Request budget: retries must fit your SLA, not exceed it
For an 8-second end-to-end budget:
- first attempt: 3.5s
- two retries: 1.5s each
- remaining headroom: network tail and serialization
Budget is a hard boundary, not a suggestion.
5) Idempotency keys: prevent duplicate cost and duplicate writes
Build key from business identity + payload fingerprint:
idempotency_key = sha256(user_id + task_id + payload_hash)- short TTL result cache
- return cached result on replay
Production incident checklist
- check 15-minute metrics:
429_rate,5xx_rate,retry_attempt_avg - if
429_rate > 5%: reduce token bucket rate by 20% - if
5xx_rate > 10%: open breaker and pause non-critical traffic - verify jitter is enabled
- verify
Retry-Afterhandling - verify batch traffic cannot consume online capacity
Minimal middleware skeleton
func CallOpenAI(ctx context.Context, req *http.Request) (*http.Response, error) {
if err := allow(ctx); err != nil { return nil, err }
var lastErr error
for attempt := 0; attempt <= 2; attempt++ {
resp, err := cb.Execute(func() (interface{}, error) {
cctx, cancel := context.WithTimeout(ctx, 3500*time.Millisecond)
defer cancel()
return client.Do(req.WithContext(cctx))
})
if err == nil {
r := resp.(*http.Response)
if !retryable(r.StatusCode) { return r, nil }
lastErr = fmt.Errorf("retryable status=%d", r.StatusCode)
} else {
lastErr = err
}
if attempt == 2 { break }
time.Sleep(backoff(attempt))
}
return nil, lastErr
}
Final takeaway
If you only “retry harder,” you amplify failures. If you control flow, budget retries, and break circuits on bad windows, your Go + OpenAI stack behaves like a production system instead of a lucky demo.