You don’t need an AI reviewer that “sounds smart.” You need a gate that stops risky PRs before they hit main.
This post shows a production-ready minimum setup: OpenAI Responses generates structured risk output, GitHub Actions enforces tiered policies, and critical failures can trigger a one-click rollback.
Goal and Use Cases
This setup is for teams that:
- Handle many PRs and miss high-risk changes in manual review
- Already run CI but lack semantic quality checks (API compatibility, data leakage, privilege issues)
- Need AI output in auditable JSON, not free-form prose
Conservative objective: block obvious high-risk changes first, then expand coverage.
Architecture (Conservative and Reversible)
Use three layers:
- Hard-rule layer: lint/test/secret scan
- AI semantic eval layer: Responses returns structured risk report
- Policy decision layer: pass/warn/block based on risk level
Recommended levels:
low: passmedium: pass with mandatory human confirmationhigh: block and request fixescritical: block + trigger rollback playbook
Step 1: Define an Executable Risk Schema
Turn subjective review into machine-enforceable fields.
{
"risk_level": "low|medium|high|critical",
"summary": "string",
"findings": [
{
"type": "security|compatibility|data-loss|performance|compliance",
"severity": "low|medium|high|critical",
"file": "string",
"evidence": "string",
"fix": "string"
}
],
"confidence": 0.0,
"block": true,
"rollback_recommended": false
}
Key rules:
- Never trust
blockblindly; policy layer re-checks it - If
confidence < 0.6, downgrade tomediumto avoid false positives
Step 2: Call the Evaluator in GitHub Actions
Minimal workflow fragment:
name: pr-risk-gate
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
gate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Collect PR diff
run: |
git fetch origin ${{ github.base_ref }} --depth=1
git diff --unified=0 origin/${{ github.base_ref }}...HEAD > pr.diff
- name: AI risk evaluation
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python3 scripts/eval_pr_risk.py \
--diff pr.diff \
--schema scripts/risk_schema.json \
--out risk.json
- name: Decide gate policy
run: python3 scripts/gate_decision.py --in risk.json
Step 3: Enforce Tiered Blocking
Recommended gate_decision.py behavior:
critical: exit 2 (hard stop)high: exit 1 (block)medium: exit 0 + PR comment + required reviewerlow: exit 0
Example:
python3 scripts/gate_decision.py --in risk.json
code=$?
if [ "$code" -eq 2 ]; then
echo "critical risk, trigger rollback plan"
bash scripts/rollback_to_last_green.sh
exit 1
elif [ "$code" -eq 1 ]; then
echo "high risk, block merge"
exit 1
fi
Step 4: One-Click Rollback (Only for Mainline Incidents)
Rollback should require both:
- Current deployment is tied to the risky change
- Last green commit is traceable
#!/usr/bin/env bash
set -euo pipefail
LAST_GREEN_SHA=$(cat .ci/last-green.sha)
git fetch origin --depth=20
git checkout main
git reset --hard "$LAST_GREEN_SHA"
git push origin main --force-with-lease
Use force-with-lease only inside a controlled incident process with audit logs.
Common Failure Modes and Fixes
1) Too many false positives
Cause: broad prompt + loose schema.
Fix: narrow to three classes first: security/compatibility/data-loss.
2) Slow pipeline
Cause: sending full repository context.
Fix: evaluate incremental diff plus short file summaries with token budgets.
3) Blocking without actionable fixes
Cause: evaluator reports issues but no repair steps.
Fix: require executable fix guidance; if missing, auto-downgrade to medium.
MVP Rollout Checklist
- PR diff extraction
- Structured risk schema
- Gate decision script with exit-code policy
- PR comment template for fast human triage
- Rollback script + last-green tracking
Ship these five first. Then iterate.
Summary
The right pattern is not “AI decides.” It is “AI provides structured evidence, policy layer makes the final call.” That is what scales safely.