When CI keeps failing, the real risk is not “slow fixes” — it is “fast bad fixes.” This guide gives you a practical GitHub Actions + AI Agent auto-fix pipeline with failure tiering, strict edit boundaries, and merge-time gates.

Bottom line: auto-fix only low-risk, verifiable failures

Use three severity levels:

  • P0 (high risk): security, data consistency, migrations → no auto-fix
  • P1 (medium risk): dependency conflicts, type errors, lint failures → AI patch allowed, human approval required
  • P2 (low risk): formatting/docs/simple test fixes → can auto-merge after gates

Rule: speed is good, unauthorized changes are not.

1) Tier failures before invoking the agent

name: ci-auto-fix
on:
  workflow_run:
    workflows: ["CI"]
    types: [completed]

jobs:
  classify:
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    runs-on: ubuntu-latest
    outputs:
      severity: ${{ steps.cls.outputs.severity }}
    steps:
      - uses: actions/checkout@v4
      - name: classify failure
        id: cls
        run: |
          python3 scripts/classify_failure.py > result.txt
          cat result.txt
          echo "severity=$(cat result.txt)" >> "$GITHUB_OUTPUT"

A minimal classify_failure.py can map:

  • security, migration, destructiveP0
  • test failed, type error, dependencyP1
  • markdownlint, prettier, ruff formatP2

2) Put hard limits on what the agent can edit

Never hand over the full repository. Enforce scope:

ALLOWED_PATHS="src/ tests/ .github/"
MAX_CHANGED_LINES=200

git diff --name-only > changed_files.txt
python3 scripts/check_scope.py changed_files.txt "$ALLOWED_PATHS" "$MAX_CHANGED_LINES"

If scope check fails, stop and route to humans.

3) Open a fix PR (never push directly to main)

  propose-fix:
    needs: classify
    if: ${{ needs.classify.outputs.severity != 'P0' }}
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - name: run agent patch
        run: |
          ./scripts/run_agent_fix.sh "${{ needs.classify.outputs.severity }}"
      - name: open PR
        run: |
          gh pr create \
            --title "ci: agent fix for failed run ${{ github.run_id }}" \
            --body "Auto-generated patch. Requires gated checks."

4) Required merge gates (all must pass)

  1. Regression tests (unit + critical smoke e2e)
  2. Security scans (CodeQL/SAST + secret scanning)
  3. Policy checks (scope and sensitive-file protection)
- name: gated regression
  run: make test-critical

- name: security scan
  run: make security-scan

- name: policy check
  run: python3 scripts/policy_guard.py

5) Common failure modes and fixes

A) Retry storm: the agent keeps patching the same error

  • Limit to 1 auto-fix per failed run
  • Cap retries per error fingerprint (e.g., 3 in 24h)
python3 scripts/retry_budget.py --fingerprint "$ERR_FP" --max 3 --window 24h

B) Functional fix but hidden performance regression

  • Add benchmark smoke tests to gates
  • Enforce P95 thresholds on key endpoints

C) Agent modifies forbidden infrastructure files

  • Hard denylist in policy_guard.py: terraform/, migrations/, secrets/

6) MVP rollout plan

If you need to ship today:

  1. Start with P2 only
  2. Force all patches through PRs
  3. Enable regression + security gates
  4. Track metrics: auto-fix success rate, rollback rate, MTTR

Expand to P1 only after rollback rate stays below 3% for two weeks.

Summary

The winning pattern is not “smarter model first.” It is failure tiering + permission boundaries + hard validation gates. Start conservative, then widen safely.