Monitoring Your Agent

Evaluations

Evaluations are automatic checks that run when an agent finishes a turn. Use them to grade qualities that should be true across many tasks, such as helpfulness, policy compliance, or whether the agent asked for missing information.

How to use evaluations
  1. 01Open the agent's Evaluations area.
  2. 02Add the behavior you want to check and the scoring criteria.
  3. 03Enable the evaluation and review results on completed tasks.

What evaluations are for

  • Quality gates: Check whether the answer is complete, grounded, or useful.
  • Policy checks: Flag missing disclaimers, unsafe actions, or skipped approvals.
  • Workflow checks: Confirm the agent followed required steps before finishing.

Evaluation examples

  • answer-is-helpful: The reply directly addresses the user's question.
  • asks-for-missing-info: The agent asks a follow-up question when required inputs are absent.
  • uses-approved-tone: The response follows the team's tone and formatting rules.

Best practices

  • Write criteria as observable outcomes, not vague preferences.
  • Keep each evaluation focused on one behavior.
  • Use stable slugs so results are easy to compare over time.
  • Pair evaluations with benchmarks when you need both automatic scoring and repeatable test cases.