The practical problem with AI agent testing in CI/CD is not just "does the output look okay?" It is "did my agent behavior change in a way that should block this merge?" EvalView is built for that workflow — it integrates with GitHub Actions, posts PR comments with regression alerts, and provides proper exit codes for CI gating.
Every CI run checks six dimensions of agent behavior against the approved golden baseline:
Add this workflow to your repository. It runs regression checks on every PR and push to main:
# .github/workflows/evalview.yml
name: EvalView Regression Check
on: [pull_request, push]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check for regressions
uses: hidai25/eval-view@v0.6.0
with:
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
fail-on: REGRESSION
When EvalView detects changes, it posts a detailed PR comment with cost spike alerts, latency spike alerts, model change detection, and a collapsible diff of all behavior changes. On clean runs it shows a simple PASSED summary. Comments are deduplicated — EvalView updates its existing comment instead of creating new ones on each push.
EvalView provides proper exit codes for CI gating. Exit 0 means all tests passed. Exit 1 means regressions detected. The --fail-on flag controls which statuses block the pipeline:
# Only fail on score regressions (default)
evalview check --fail-on REGRESSION
# Also fail on tool changes
evalview check --fail-on REGRESSION,TOOLS_CHANGED
# Fail on any change (strictest)
evalview check --strict
If you want regression blocking without CI setup, EvalView supports Git pre-push hooks:
evalview install-hooks # Pre-push regression blocking
AI Agent Regression Testing Guide | LangGraph Testing Guide | Back to EvalView homepage