Tool-Calling Agent Testing

The hardest agent bugs are often not in the final text. They are in the hidden trajectory: wrong tool, wrong order, missing clarification, or dangerous tool use in the wrong scenario.

What to assert

How EvalView helps

Back to EvalView homepage | View on GitHub