The hardest part of Structured Outputs is not getting JSON once. It is surviving schema changes without turning production into a small fire with excellent logs and terrible business results.

Once a Go service starts evolving prompts and response contracts, the usual failure modes show up fast: a new required field breaks older consumers, an enum expands and strict validation kills valid requests, or one bad sample drags the whole chain into retries and rollback panic.

TL;DR: conservative rollout needs four guardrails

  1. Version every schema instead of editing in place.
  2. Decode on two tracks so strict parsing can fall back safely.
  3. Ramp traffic gradually instead of flipping 100% at once.
  4. Prepare rollback commands first instead of inventing them during the outage.

If you are using OpenAI Responses Structured Outputs from Go, the winning mindset is not “maximum strictness”. It is evolvable, observable, and reversible.

How these incidents usually start

The pattern is painfully common:

  • you add two new required fields
  • you expand an enum that downstream code does not recognize
  • you change steps from a string to an object array
  • you assume model output always matches the latest schema
  • you return HTTP 500 on strict validation failure

Then the boring disaster begins:

  • the new schema hits production and old traffic starts failing
  • retries amplify bad samples into a storm
  • logs only say invalid json schema output
  • the real problem is not the model — it is overconfident rollout

1) Never do in-place schema surgery

Use explicit versions for structural changes.

A practical payload shape looks like this:

{
  "schema_version": "v2",
  "ticket_type": "billing",
  "priority": "high",
  "summary": "duplicate charge after retry",
  "actions": [
    "check idempotency key",
    "review payment callback logs"
  ]
}

In Go, keep versioned structs instead of one “forever” type:

type TicketV1 struct {
    SchemaVersion string   `json:"schema_version"`
    Type          string   `json:"ticket_type"`
    Summary       string   `json:"summary"`
}

type TicketV2 struct {
    SchemaVersion string   `json:"schema_version"`
    Type          string   `json:"ticket_type"`
    Priority      string   `json:"priority"`
    Summary       string   `json:"summary"`
    Actions       []string `json:"actions"`
}

Normalize and diff schemas before rollout instead of eyeballing them:

jq -S . schema/ticket_v1.json > /tmp/ticket_v1.norm.json
jq -S . schema/ticket_v2.json > /tmp/ticket_v2.norm.json
diff -u /tmp/ticket_v1.norm.json /tmp/ticket_v2.norm.json

The risky questions are not “did the schema change?” but:

  • did required expand?
  • did an enum become narrower?
  • did a field type change?
  • did an old field disappear?

The first two break production the fastest.

2) Bad-case fallback: strict first, compatible second

A conservative decode chain is simple:

  • layer 1: decode strictly against the current version
  • layer 2: on failure, try a compatible or tolerant path
  • layer 3: if that still fails, degrade to a readable summary instead of wasting the whole request

Example:

func decodeTicket(raw []byte) (any, error) {
    var peek struct {
        SchemaVersion string `json:"schema_version"`
    }
    if err := json.Unmarshal(raw, &peek); err != nil {
        return nil, fmt.Errorf("peek version: %w", err)
    }

    switch peek.SchemaVersion {
    case "v2":
        var out TicketV2
        if err := json.Unmarshal(raw, &out); err == nil {
            return out, nil
        }
        return decodeTicketCompat(raw)
    case "v1":
        var out TicketV1
        if err := json.Unmarshal(raw, &out); err != nil {
            return nil, err
        }
        return out, nil
    default:
        return decodeTicketCompat(raw)
    }
}

Do not silently swallow fallback. At minimum, record:

  • schema_version_detected
  • decode_mode=strict|compat|summary
  • fallback_reason
  • model
  • prompt_version

Keep a replay folder for bad cases. It pays for itself quickly:

mkdir -p /Users/wow/dev/book/mengboy/tmp/structured-output-samples/bad
cp bad_case.json /Users/wow/dev/book/mengboy/tmp/structured-output-samples/bad/
go test ./... -run TestDecodeTicketCompat -v

3) Canary rollout: do not full-send on day one

Structured Outputs problems often appear only on real long-tail inputs.

So “staging looked fine” is not evidence for a full production flip. It is evidence that staging is staging.

A practical rollout ladder:

  • 5%: watch parse success and bad-case distribution
  • 20%: inspect business field correctness
  • 50%: observe cost, latency, and fallback ratio
  • 100%: switch the main path only after stable windows

The dumbest and most useful rollout control is still an environment variable:

export STRUCTURED_OUTPUT_SCHEMA_VERSION=v2
export STRUCTURED_OUTPUT_CANARY_PERCENT=20
export STRUCTURED_OUTPUT_COMPAT_FALLBACK=1
./bin/api

Check the effective runtime config:

curl -s http://127.0.0.1:8080/debug/config | jq '.structured_output'

If compat traffic keeps growing, do not call it “good enough”. It usually means:

  • prompt and schema have drifted apart
  • certain samples hit model output boundaries
  • your supposedly required field is not reliably produced

4) Gradual rollback: write the commands before the incident

A useful rollback is not “Git has history”. A useful rollback is “production can recover in five minutes”.

I recommend keeping two rollback modes ready:

  1. configuration rollback: switch v2 back to v1
  2. logic rollback: keep v2 generation but force consumers onto compat

Minimum rollback playbook:

export STRUCTURED_OUTPUT_SCHEMA_VERSION=v1
export STRUCTURED_OUTPUT_CANARY_PERCENT=0
export STRUCTURED_OUTPUT_COMPAT_FALLBACK=1
./bin/api

If you are on Kubernetes or another orchestrator, make it explicit rather than memorable:

kubectl set env deploy/agent-api \
  STRUCTURED_OUTPUT_SCHEMA_VERSION=v1 \
  STRUCTURED_OUTPUT_CANARY_PERCENT=0 \
  STRUCTURED_OUTPUT_COMPAT_FALLBACK=1

5) Success rate is not enough

Structured Outputs is sneaky because you can get HTTP 200 while business structure is already broken.

Ship at least these metrics:

  • structured_decode_total{mode,version}
  • structured_decode_fail_total{reason}
  • structured_fallback_total{from,to}
  • structured_summary_degrade_total
  • structured_field_missing_total{field}
  • structured_enum_unknown_total{field,value}

Three quick checks during an incident:

grep -R "decode_mode=compat" /var/log/agent-api | tail -n 50

grep -R "fallback_reason" /var/log/agent-api | sort | uniq -c

grep -R "schema_version=v2" /var/log/agent-api | tail -n 100

If summary degradation is rising, the system is not “more strict”. It is just more fragile.

6) A conservative Go rollout order that actually works

If you need stable Structured Outputs within a week, my order is:

  1. add schema_version
  2. implement strict + compatible decoding
  3. add bad-case replay tests
  4. only then move to full traffic

Do not reverse this. Full rollout before fallback is how teams create personal character-building exercises at 2 a.m.

Common failures and troubleshooting

1) Success rate drops after a new required field

Check first:

  • does the prompt clearly request the field?
  • do historical samples often omit it already?
  • can the compat path still accept the older shape?

2) The model returns an unknown enum value

Safer handling:

  • map it to unknown first
  • log the raw value for audit
  • do not kill the entire request over one fresh enum

3) Canary shows HTTP 200 but downstream crashes on nil

That usually means decode succeeded but semantics failed.

Add:

  • field-level business validation
  • default value policy
  • nil assertions before downstream consumption

Summary

The real challenge of Structured Outputs is not generating JSON. It is keeping the system alive while schemas keep moving.

For a conservative target, the advice is blunt:

  • version the schema
  • use dual-path decoding
  • ramp traffic by percentage
  • prepare rollback commands in advance

Minimum viable plan

If you can only fix one round today:

  1. add schema_version
  2. implement strict -> compat -> summary decoding in Go
  3. keep v2 at 5% canary
  4. monitor structured_fallback_total

First make the system hard to break. Pretty structure can wait. Production is not an art gallery.