The hardest part of Structured Outputs is not getting JSON once. It is surviving schema changes without turning production into a small fire with excellent logs and terrible business results.
Once a Go service starts evolving prompts and response contracts, the usual failure modes show up fast: a new required field breaks older consumers, an enum expands and strict validation kills valid requests, or one bad sample drags the whole chain into retries and rollback panic.
TL;DR: conservative rollout needs four guardrails
- Version every schema instead of editing in place.
- Decode on two tracks so strict parsing can fall back safely.
- Ramp traffic gradually instead of flipping 100% at once.
- Prepare rollback commands first instead of inventing them during the outage.
If you are using OpenAI Responses Structured Outputs from Go, the winning mindset is not “maximum strictness”. It is evolvable, observable, and reversible.
How these incidents usually start
The pattern is painfully common:
- you add two new required fields
- you expand an enum that downstream code does not recognize
- you change
stepsfrom a string to an object array - you assume model output always matches the latest schema
- you return HTTP 500 on strict validation failure
Then the boring disaster begins:
- the new schema hits production and old traffic starts failing
- retries amplify bad samples into a storm
- logs only say
invalid json schema output - the real problem is not the model — it is overconfident rollout
1) Never do in-place schema surgery
Use explicit versions for structural changes.
A practical payload shape looks like this:
{
"schema_version": "v2",
"ticket_type": "billing",
"priority": "high",
"summary": "duplicate charge after retry",
"actions": [
"check idempotency key",
"review payment callback logs"
]
}
In Go, keep versioned structs instead of one “forever” type:
type TicketV1 struct {
SchemaVersion string `json:"schema_version"`
Type string `json:"ticket_type"`
Summary string `json:"summary"`
}
type TicketV2 struct {
SchemaVersion string `json:"schema_version"`
Type string `json:"ticket_type"`
Priority string `json:"priority"`
Summary string `json:"summary"`
Actions []string `json:"actions"`
}
Normalize and diff schemas before rollout instead of eyeballing them:
jq -S . schema/ticket_v1.json > /tmp/ticket_v1.norm.json
jq -S . schema/ticket_v2.json > /tmp/ticket_v2.norm.json
diff -u /tmp/ticket_v1.norm.json /tmp/ticket_v2.norm.json
The risky questions are not “did the schema change?” but:
- did
requiredexpand? - did an enum become narrower?
- did a field type change?
- did an old field disappear?
The first two break production the fastest.
2) Bad-case fallback: strict first, compatible second
A conservative decode chain is simple:
- layer 1: decode strictly against the current version
- layer 2: on failure, try a compatible or tolerant path
- layer 3: if that still fails, degrade to a readable summary instead of wasting the whole request
Example:
func decodeTicket(raw []byte) (any, error) {
var peek struct {
SchemaVersion string `json:"schema_version"`
}
if err := json.Unmarshal(raw, &peek); err != nil {
return nil, fmt.Errorf("peek version: %w", err)
}
switch peek.SchemaVersion {
case "v2":
var out TicketV2
if err := json.Unmarshal(raw, &out); err == nil {
return out, nil
}
return decodeTicketCompat(raw)
case "v1":
var out TicketV1
if err := json.Unmarshal(raw, &out); err != nil {
return nil, err
}
return out, nil
default:
return decodeTicketCompat(raw)
}
}
Do not silently swallow fallback. At minimum, record:
schema_version_detecteddecode_mode=strict|compat|summaryfallback_reasonmodelprompt_version
Keep a replay folder for bad cases. It pays for itself quickly:
mkdir -p /Users/wow/dev/book/mengboy/tmp/structured-output-samples/bad
cp bad_case.json /Users/wow/dev/book/mengboy/tmp/structured-output-samples/bad/
go test ./... -run TestDecodeTicketCompat -v
3) Canary rollout: do not full-send on day one
Structured Outputs problems often appear only on real long-tail inputs.
So “staging looked fine” is not evidence for a full production flip. It is evidence that staging is staging.
A practical rollout ladder:
- 5%: watch parse success and bad-case distribution
- 20%: inspect business field correctness
- 50%: observe cost, latency, and fallback ratio
- 100%: switch the main path only after stable windows
The dumbest and most useful rollout control is still an environment variable:
export STRUCTURED_OUTPUT_SCHEMA_VERSION=v2
export STRUCTURED_OUTPUT_CANARY_PERCENT=20
export STRUCTURED_OUTPUT_COMPAT_FALLBACK=1
./bin/api
Check the effective runtime config:
curl -s http://127.0.0.1:8080/debug/config | jq '.structured_output'
If compat traffic keeps growing, do not call it “good enough”. It usually means:
- prompt and schema have drifted apart
- certain samples hit model output boundaries
- your supposedly required field is not reliably produced
4) Gradual rollback: write the commands before the incident
A useful rollback is not “Git has history”. A useful rollback is “production can recover in five minutes”.
I recommend keeping two rollback modes ready:
- configuration rollback: switch
v2back tov1 - logic rollback: keep
v2generation but force consumers ontocompat
Minimum rollback playbook:
export STRUCTURED_OUTPUT_SCHEMA_VERSION=v1
export STRUCTURED_OUTPUT_CANARY_PERCENT=0
export STRUCTURED_OUTPUT_COMPAT_FALLBACK=1
./bin/api
If you are on Kubernetes or another orchestrator, make it explicit rather than memorable:
kubectl set env deploy/agent-api \
STRUCTURED_OUTPUT_SCHEMA_VERSION=v1 \
STRUCTURED_OUTPUT_CANARY_PERCENT=0 \
STRUCTURED_OUTPUT_COMPAT_FALLBACK=1
5) Success rate is not enough
Structured Outputs is sneaky because you can get HTTP 200 while business structure is already broken.
Ship at least these metrics:
structured_decode_total{mode,version}structured_decode_fail_total{reason}structured_fallback_total{from,to}structured_summary_degrade_totalstructured_field_missing_total{field}structured_enum_unknown_total{field,value}
Three quick checks during an incident:
grep -R "decode_mode=compat" /var/log/agent-api | tail -n 50
grep -R "fallback_reason" /var/log/agent-api | sort | uniq -c
grep -R "schema_version=v2" /var/log/agent-api | tail -n 100
If summary degradation is rising, the system is not “more strict”. It is just more fragile.
6) A conservative Go rollout order that actually works
If you need stable Structured Outputs within a week, my order is:
- add
schema_version - implement strict + compatible decoding
- add bad-case replay tests
- only then move to full traffic
Do not reverse this. Full rollout before fallback is how teams create personal character-building exercises at 2 a.m.
Common failures and troubleshooting
1) Success rate drops after a new required field
Check first:
- does the prompt clearly request the field?
- do historical samples often omit it already?
- can the compat path still accept the older shape?
2) The model returns an unknown enum value
Safer handling:
- map it to
unknownfirst - log the raw value for audit
- do not kill the entire request over one fresh enum
3) Canary shows HTTP 200 but downstream crashes on nil
That usually means decode succeeded but semantics failed.
Add:
- field-level business validation
- default value policy
- nil assertions before downstream consumption
Summary
The real challenge of Structured Outputs is not generating JSON. It is keeping the system alive while schemas keep moving.
For a conservative target, the advice is blunt:
- version the schema
- use dual-path decoding
- ramp traffic by percentage
- prepare rollback commands in advance
Minimum viable plan
If you can only fix one round today:
- add
schema_version - implement
strict -> compat -> summarydecoding in Go - keep
v2at 5% canary - monitor
structured_fallback_total
First make the system hard to break. Pretty structure can wait. Production is not an art gallery.