OpenAI Batch API with Go: Offline Batching, Failure Replay, and Cost Boundaries

Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first.

Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay.

Practical rollout plan (conservative mode)

Use this order:

Split tasks into “offline-safe” vs “must-be-realtime”.
Define batch windows and max batch size.
Attach custom_id to every task for traceability and idempotency.
Route outcomes into success, retryable failure, non-retryable failure.
Canary first, full switch later.

1) Define cost boundaries before coding

Answer three questions first:

Latency tolerance: how late can results arrive (1h / 6h / 24h)?
Failure tolerance: what replay ratio is acceptable?
Budget ceiling: what is the hard daily spending cap?

Minimal budget dashboard with local logs:

jq -r '.date + "\t" + (.input|tostring) + "\t" + (.ok|tostring) + "\t" + (.fail|tostring)' /Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json

python3 - <<'PY'
import json,glob
for f in glob.glob('/Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json'):
    d=json.load(open(f))
    rate=d['fail']/max(d['input'],1)
    if rate>0.03:
        print('ALERT', f, f'{rate:.2%}')
PY

2) Offline splitting: optimize for stable throughput, not max peak

Conservative defaults:

Start with small batches (for example 200~500 records)
Use fixed windows (for example every 15 minutes)
Submit on either threshold: window timeout or item cap

In Go, normalize tasks and enforce unique custom_id:

type BatchTask struct {
    CustomID string          `json:"custom_id"`
    Method   string          `json:"method"`
    URL      string          `json:"url"`
    Body     json.RawMessage `json:"body"`
}

func buildCustomID(bizKey string, ts int64) string {
    return fmt.Sprintf("%s-%d", bizKey, ts)
}

Persist JSONL input for audit and replay:

mkdir -p /Users/wow/dev/book/mengboy/tmp/batch-input

ls -lh /Users/wow/dev/book/mengboy/tmp/batch-input

3) Failure replay: replay only retryable failures

Do not replay everything blindly. Use three buckets:

SUCCESS: store result and close task
RETRYABLE_FAIL: replay with capped attempts (for example 3)
FINAL_FAIL: move to manual queue

Keep replay rules centralized:

func retryable(code string) bool {
    switch code {
    case "rate_limit", "timeout", "server_error":
        return true
    default:
        return false
    }
}

Replay generation should be executable as a standalone command:

python3 /Users/wow/dev/book/mengboy/scripts/rebuild_batch_replay.py \
  --failed /Users/wow/dev/book/mengboy/tmp/batch-failed/failed-2026-03-13.jsonl \
  --out /Users/wow/dev/book/mengboy/tmp/batch-replay/replay-2026-03-13.jsonl

4) Idempotency first, automation second

The classic replay incident is duplicated inserts.

Minimum requirements:

unique key on business table (custom_id)
upsert writes
track replay_count

ALTER TABLE ai_batch_result
ADD CONSTRAINT uk_custom_id UNIQUE (custom_id);

5) Deployment acceptance metrics

Do not use “request success” as your only KPI. Track at least:

daily cost within target range
controllable failure ratio
stable post-replay success ratio
manageable manual fallback volume

For conservative execution, keep part of the critical path on realtime mode until offline replay remains stable for multiple cycles.

MVP checklist

If time is tight, do these five first:

migrate only non-realtime tasks to Batch API
enforce custom_id on every task
split failures into retryable/non-retryable
cap replay attempts to 3, then manual fallback
publish daily cost + failure report

This is usually enough to reduce cost significantly without compromising stability.

Practical rollout plan (conservative mode)#

1) Define cost boundaries before coding#

2) Offline splitting: optimize for stable throughput, not max peak#

3) Failure replay: replay only retryable failures#

4) Idempotency first, automation second#

5) Deployment acceptance metrics#

MVP checklist#