Short answer: if your workload is delay-tolerant, batchable, and replay-safe, move it from online calls to Batch API. The savings are real, but only if you design splitting, failure routing, and replay first.
Many teams treat Batch API as a cheaper sync endpoint. That usually creates a replay mess instead of stable savings. A conservative rollout starts with cost boundaries and SLOs, then implements offline batching and controlled replay.
Practical rollout plan (conservative mode)
Use this order:
- Split tasks into “offline-safe” vs “must-be-realtime”.
- Define batch windows and max batch size.
- Attach
custom_idto every task for traceability and idempotency. - Route outcomes into success, retryable failure, non-retryable failure.
- Canary first, full switch later.
1) Define cost boundaries before coding
Answer three questions first:
- Latency tolerance: how late can results arrive (1h / 6h / 24h)?
- Failure tolerance: what replay ratio is acceptable?
- Budget ceiling: what is the hard daily spending cap?
Minimal budget dashboard with local logs:
jq -r '.date + "\t" + (.input|tostring) + "\t" + (.ok|tostring) + "\t" + (.fail|tostring)' /Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json
python3 - <<'PY'
import json,glob
for f in glob.glob('/Users/wow/dev/book/mengboy/tmp/batch-metrics/*.json'):
d=json.load(open(f))
rate=d['fail']/max(d['input'],1)
if rate>0.03:
print('ALERT', f, f'{rate:.2%}')
PY
2) Offline splitting: optimize for stable throughput, not max peak
Conservative defaults:
- Start with small batches (for example 200~500 records)
- Use fixed windows (for example every 15 minutes)
- Submit on either threshold: window timeout or item cap
In Go, normalize tasks and enforce unique custom_id:
type BatchTask struct {
CustomID string `json:"custom_id"`
Method string `json:"method"`
URL string `json:"url"`
Body json.RawMessage `json:"body"`
}
func buildCustomID(bizKey string, ts int64) string {
return fmt.Sprintf("%s-%d", bizKey, ts)
}
Persist JSONL input for audit and replay:
mkdir -p /Users/wow/dev/book/mengboy/tmp/batch-input
ls -lh /Users/wow/dev/book/mengboy/tmp/batch-input
3) Failure replay: replay only retryable failures
Do not replay everything blindly. Use three buckets:
- SUCCESS: store result and close task
- RETRYABLE_FAIL: replay with capped attempts (for example 3)
- FINAL_FAIL: move to manual queue
Keep replay rules centralized:
func retryable(code string) bool {
switch code {
case "rate_limit", "timeout", "server_error":
return true
default:
return false
}
}
Replay generation should be executable as a standalone command:
python3 /Users/wow/dev/book/mengboy/scripts/rebuild_batch_replay.py \
--failed /Users/wow/dev/book/mengboy/tmp/batch-failed/failed-2026-03-13.jsonl \
--out /Users/wow/dev/book/mengboy/tmp/batch-replay/replay-2026-03-13.jsonl
4) Idempotency first, automation second
The classic replay incident is duplicated inserts.
Minimum requirements:
- unique key on business table (
custom_id) - upsert writes
- track
replay_count
ALTER TABLE ai_batch_result
ADD CONSTRAINT uk_custom_id UNIQUE (custom_id);
5) Deployment acceptance metrics
Do not use “request success” as your only KPI. Track at least:
- daily cost within target range
- controllable failure ratio
- stable post-replay success ratio
- manageable manual fallback volume
For conservative execution, keep part of the critical path on realtime mode until offline replay remains stable for multiple cycles.
MVP checklist
If time is tight, do these five first:
- migrate only non-realtime tasks to Batch API
- enforce
custom_idon every task - split failures into retryable/non-retryable
- cap replay attempts to 3, then manual fallback
- publish daily cost + failure report
This is usually enough to reduce cost significantly without compromising stability.