When CI keeps failing, the real risk is not “slow fixes” — it is “fast bad fixes.” This guide gives you a practical GitHub Actions + AI Agent auto-fix pipeline with failure tiering, strict edit boundaries, and merge-time gates.
Bottom line: auto-fix only low-risk, verifiable failures
Use three severity levels:
- P0 (high risk): security, data consistency, migrations → no auto-fix
- P1 (medium risk): dependency conflicts, type errors, lint failures → AI patch allowed, human approval required
- P2 (low risk): formatting/docs/simple test fixes → can auto-merge after gates
Rule: speed is good, unauthorized changes are not.
1) Tier failures before invoking the agent
name: ci-auto-fix
on:
workflow_run:
workflows: ["CI"]
types: [completed]
jobs:
classify:
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
runs-on: ubuntu-latest
outputs:
severity: ${{ steps.cls.outputs.severity }}
steps:
- uses: actions/checkout@v4
- name: classify failure
id: cls
run: |
python3 scripts/classify_failure.py > result.txt
cat result.txt
echo "severity=$(cat result.txt)" >> "$GITHUB_OUTPUT"
A minimal classify_failure.py can map:
security,migration,destructive→P0test failed,type error,dependency→P1markdownlint,prettier,ruff format→P2
2) Put hard limits on what the agent can edit
Never hand over the full repository. Enforce scope:
ALLOWED_PATHS="src/ tests/ .github/"
MAX_CHANGED_LINES=200
git diff --name-only > changed_files.txt
python3 scripts/check_scope.py changed_files.txt "$ALLOWED_PATHS" "$MAX_CHANGED_LINES"
If scope check fails, stop and route to humans.
3) Open a fix PR (never push directly to main)
propose-fix:
needs: classify
if: ${{ needs.classify.outputs.severity != 'P0' }}
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: run agent patch
run: |
./scripts/run_agent_fix.sh "${{ needs.classify.outputs.severity }}"
- name: open PR
run: |
gh pr create \
--title "ci: agent fix for failed run ${{ github.run_id }}" \
--body "Auto-generated patch. Requires gated checks."
4) Required merge gates (all must pass)
- Regression tests (unit + critical smoke e2e)
- Security scans (CodeQL/SAST + secret scanning)
- Policy checks (scope and sensitive-file protection)
- name: gated regression
run: make test-critical
- name: security scan
run: make security-scan
- name: policy check
run: python3 scripts/policy_guard.py
5) Common failure modes and fixes
A) Retry storm: the agent keeps patching the same error
- Limit to 1 auto-fix per failed run
- Cap retries per error fingerprint (e.g., 3 in 24h)
python3 scripts/retry_budget.py --fingerprint "$ERR_FP" --max 3 --window 24h
B) Functional fix but hidden performance regression
- Add benchmark smoke tests to gates
- Enforce P95 thresholds on key endpoints
C) Agent modifies forbidden infrastructure files
- Hard denylist in
policy_guard.py:terraform/,migrations/,secrets/
6) MVP rollout plan
If you need to ship today:
- Start with P2 only
- Force all patches through PRs
- Enable regression + security gates
- Track metrics: auto-fix success rate, rollback rate, MTTR
Expand to P1 only after rollback rate stays below 3% for two weeks.
Summary
The winning pattern is not “smarter model first.” It is failure tiering + permission boundaries + hard validation gates. Start conservative, then widen safely.