Policies¶
Policies define the rules that govern how evaluations are handled for each agent. Configure them in the admin dashboard under Settings > Policies.
Risk Thresholds¶
Each agent has two configurable thresholds:
| Threshold | Default | Description |
|---|---|---|
| Auto-approve | 0.3 | Outputs with risk score below this are auto-approved |
| Reject | 0.9 | Outputs with risk score above this are rejected |
Outputs between the two thresholds are escalated for human review.
Adjusting Thresholds¶
Thresholds should reflect your risk tolerance:
| Use Case | Auto-approve | Reject | Notes |
|---|---|---|---|
| High-risk medical | 0.1 | 0.7 | Most outputs escalated for review |
| General support | 0.5 | 0.95 | Most outputs auto-approved |
| Internal tools | 0.7 | 0.99 | Minimal escalation |
As an agent's health score improves over time, you can safely raise the auto-approve threshold.
Escalation Routing¶
Configure who receives escalations:
- Default reviewers — All escalations for an agent go to a reviewer group
- Field-based routing — Route specific output fields to specialized reviewers (e.g.,
icd_codeto medical coding experts) - Risk-based routing — High-risk escalations go to senior reviewers
Routing is configured per-agent in the dashboard.
Metric Requirements¶
Define which metrics must be computed for each agent's evaluations:
{
"agent_id": "claims-bot",
"required_metrics": {
"diagnosis": ["semantic_similarity", "f1"],
"icd_code": ["exact_match"],
"recommendation": ["semantic_similarity"]
}
}
When evaluate() is called without the required metrics, the platform adds them automatically.
Policy Inheritance¶
Policies can be set at two levels:
- Organization default — Applies to all agents unless overridden
- Agent-specific — Overrides the organization default for a specific agent
Best Practices¶
- Start conservative — Begin with low auto-approve thresholds and increase as the agent proves reliable.
- Monitor escalation volume — If too many outputs are escalated, reviewers get overwhelmed. Adjust thresholds or improve the agent.
- Use field-based routing — Different output fields may need different domain expertise.
- Review policies quarterly — As agents improve, update thresholds to reduce unnecessary escalations.