Name: EvalView
Author: EvalView

Question 1

What is EvalView?

Accepted Answer

EvalView is an open-source regression testing framework for AI agents. It sends test queries to your agent, records everything (tool calls, parameters, sequence, output, cost, latency), and diffs against a golden baseline. Install with pip install evalview.

Question 2

How is EvalView different from LangSmith?

Accepted Answer

LangSmith is for observability and tracing — it shows you what your agent did. EvalView is for testing and regression detection — it tells you whether your agent broke. They're complementary tools.

Question 3

Is EvalView free?

Accepted Answer

Yes. EvalView is free and open source under the Apache 2.0 license. Core regression detection works without any API keys. Use Ollama for completely free, fully offline LLM-as-judge evaluation.

Question 4

What frameworks does EvalView support?

Accepted Answer

EvalView works with LangGraph, CrewAI, OpenAI Assistants, Anthropic Claude, HuggingFace, Ollama, MCP servers, and any HTTP API. It also supports SKILL.md validation for Claude Code and OpenAI Codex.

Question 5

Can EvalView run in CI/CD?

Accepted Answer

Yes. EvalView has a GitHub Action (hidai25/eval-view@v0.5.3), exit codes, JSON output, and PR comments with cost/latency/model change alerts. It works with GitHub Actions, GitLab CI, CircleCI, and any CI system that runs Python.

Question 6

Can EvalView handle non-deterministic LLM outputs?

Accepted Answer

Yes. EvalView provides multi-reference goldens (up to 5 variants per test), statistical mode with pass@k metrics, flexible subsequence matching, and tool categories that match by intent instead of exact names.

EvalView — Regression Testing for AI Agents

Quick Start

What It Catches

Four Scoring Layers

Key Features

How EvalView Compares

vs. LangSmith

vs. Braintrust

vs. Langfuse

Supported Frameworks

Guides