Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first.

Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay.

Practical rollout plan (conservative mode)

Use this order:

  1. Split tasks into “offline-safe” vs “must-be-realtime”.
  2. Define batch windows and max batch size.
  3. Attach custom_id to every task for traceability and idempotency.
  4. Route outcomes into success, retryable failure, non-retryable failure.
  5. Canary first, full switch later.

1) Define cost boundaries before coding

Answer three questions first:

  • Latency tolerance: how late can results arrive (1h / 6h / 24h)?
  • Failure tolerance: what replay ratio is acceptable?
  • Budget ceiling: what is the hard daily spending cap?

Minimal budget dashboard with local logs:

jq -r '.date + "\t" + (.input|tostring) + "\t" + (.ok|tostring) + "\t" + (.fail|tostring)' /Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json
python3 - <<'PY'
import json,glob
for f in glob.glob('/Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json'):
    d=json.load(open(f))
    rate=d['fail']/max(d['input'],1)
    if rate>0.03:
        print('ALERT', f, f'{rate:.2%}')
PY

2) Offline splitting: optimize for stable throughput, not max peak

Conservative defaults:

  • Start with small batches (for example 200~500 records)
  • Use fixed windows (for example every 15 minutes)
  • Submit on either threshold: window timeout or item cap

In Go, normalize tasks and enforce unique custom_id:

type BatchTask struct {
    CustomID string          `json:"custom_id"`
    Method   string          `json:"method"`
    URL      string          `json:"url"`
    Body     json.RawMessage `json:"body"`
}

func buildCustomID(bizKey string, ts int64) string {
    return fmt.Sprintf("%s-%d", bizKey, ts)
}

Persist JSONL input for audit and replay:

mkdir -p /Users/wow/dev/book/mengboy/tmp/batch-input
ls -lh /Users/wow/dev/book/mengboy/tmp/batch-input

3) Failure replay: replay only retryable failures

Do not replay everything blindly. Use three buckets:

  • SUCCESS: store result and close task
  • RETRYABLE_FAIL: replay with capped attempts (for example 3)
  • FINAL_FAIL: move to manual queue

Keep replay rules centralized:

func retryable(code string) bool {
    switch code {
    case "rate_limit", "timeout", "server_error":
        return true
    default:
        return false
    }
}

Replay generation should be executable as a standalone command:

python3 /Users/wow/dev/book/mengboy/scripts/rebuild_batch_replay.py \
  --failed /Users/wow/dev/book/mengboy/tmp/batch-failed/failed-2026-03-13.jsonl \
  --out /Users/wow/dev/book/mengboy/tmp/batch-replay/replay-2026-03-13.jsonl

4) Idempotency first, automation second

The classic replay incident is duplicated inserts.

Minimum requirements:

  • unique key on business table (custom_id)
  • upsert writes
  • track replay_count
ALTER TABLE ai_batch_result
ADD CONSTRAINT uk_custom_id UNIQUE (custom_id);

5) Deployment acceptance metrics

Do not use “request success” as your only KPI. Track at least:

  1. daily cost within target range
  2. controllable failure ratio
  3. stable post-replay success ratio
  4. manageable manual fallback volume

For conservative execution, keep part of the critical path on realtime mode until offline replay remains stable for multiple cycles.

MVP checklist

If time is tight, do these five first:

  1. migrate only non-realtime tasks to Batch API
  2. enforce custom_id on every task
  3. split failures into retryable/non-retryable
  4. cap replay attempts to 3, then manual fallback
  5. publish daily cost + failure report

This is usually enough to reduce cost significantly without compromising stability.