AI Agent Testing in CI/CD

The practical problem with AI agent testing in CI/CD is not just "does the output look okay?" It is "did my agent behavior change in a way that should block this merge?" EvalView is built for that workflow — it integrates with GitHub Actions, posts PR comments with regression alerts, and provides proper exit codes for CI gating.

What EvalView tests in CI

Every CI run checks six dimensions of agent behavior against the approved golden baseline:

tool calls — did the agent call the right tools?
tool sequence — did it call them in the right order?
output drift — did the response quality degrade?
cost changes — did token usage or API costs spike?
latency changes — did the agent slow down significantly?
safety contracts — did the agent use forbidden tools?

GitHub Actions setup

Add this workflow to your repository. It runs regression checks on every PR and push to main:

# .github/workflows/evalview.yml
name: EvalView Regression Check
on: [pull_request, push]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check for regressions
        uses: hidai25/eval-view@v0.6.0
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          fail-on: REGRESSION

What PR comments show

When EvalView detects changes, it posts a detailed PR comment with cost spike alerts, latency spike alerts, model change detection, and a collapsible diff of all behavior changes. On clean runs it shows a simple PASSED summary. Comments are deduplicated — EvalView updates its existing comment instead of creating new ones on each push.

CI exit codes

EvalView provides proper exit codes for CI gating. Exit 0 means all tests passed. Exit 1 means regressions detected. The --fail-on flag controls which statuses block the pipeline:

# Only fail on score regressions (default)
evalview check --fail-on REGRESSION

# Also fail on tool changes
evalview check --fail-on REGRESSION,TOOLS_CHANGED

# Fail on any change (strictest)
evalview check --strict

Pre-push hooks

If you want regression blocking without CI setup, EvalView supports Git pre-push hooks:

evalview install-hooks    # Pre-push regression blocking

Works with any CI system

GitHub Actions — dedicated action with PR comments and job summaries
GitLab CI — use pip install evalview in your .gitlab-ci.yml
CircleCI — add evalview check as a test step
Any CI — EvalView is CLI-first, so it works anywhere Python runs

AI Agent Regression Testing Guide | LangGraph Testing Guide | Back to EvalView homepage