Monitoring Your Agent
Evaluations
Evaluations are automatic checks that run when an agent finishes a turn. Use them to grade qualities that should be true across many tasks, such as helpfulness, policy compliance, or whether the agent asked for missing information.
How to use evaluations
- 01Open the agent's Evaluations area.
- 02Add the behavior you want to check and the scoring criteria.
- 03Enable the evaluation and review results on completed tasks.
What evaluations are for
- Quality gates: Check whether the answer is complete, grounded, or useful.
- Policy checks: Flag missing disclaimers, unsafe actions, or skipped approvals.
- Workflow checks: Confirm the agent followed required steps before finishing.
Evaluation examples
answer-is-helpful: The reply directly addresses the user's question.asks-for-missing-info: The agent asks a follow-up question when required inputs are absent.uses-approved-tone: The response follows the team's tone and formatting rules.
Best practices
- Write criteria as observable outcomes, not vague preferences.
- Keep each evaluation focused on one behavior.
- Use stable slugs so results are easy to compare over time.
- Pair evaluations with benchmarks when you need both automatic scoring and repeatable test cases.