Skip to content

Policies

Policies define the rules that govern how evaluations are handled for each agent. Configure them in the admin dashboard under Settings > Policies.


Risk Thresholds

Each agent has two configurable thresholds:

Threshold Default Description
Auto-approve 0.3 Outputs with risk score below this are auto-approved
Reject 0.9 Outputs with risk score above this are rejected

Outputs between the two thresholds are escalated for human review.

Risk Score:  0.0 ─────── 0.3 ─────── 0.9 ─────── 1.0
             │ auto_approved │  escalated  │  rejected  │

Adjusting Thresholds

Thresholds should reflect your risk tolerance:

Use Case Auto-approve Reject Notes
High-risk medical 0.1 0.7 Most outputs escalated for review
General support 0.5 0.95 Most outputs auto-approved
Internal tools 0.7 0.99 Minimal escalation

As an agent's health score improves over time, you can safely raise the auto-approve threshold.


Escalation Routing

Configure who receives escalations:

  • Default reviewers — All escalations for an agent go to a reviewer group
  • Field-based routing — Route specific output fields to specialized reviewers (e.g., icd_code to medical coding experts)
  • Risk-based routing — High-risk escalations go to senior reviewers

Routing is configured per-agent in the dashboard.


Metric Requirements

Define which metrics must be computed for each agent's evaluations:

{
    "agent_id": "claims-bot",
    "required_metrics": {
        "diagnosis": ["semantic_similarity", "f1"],
        "icd_code": ["exact_match"],
        "recommendation": ["semantic_similarity"]
    }
}

When evaluate() is called without the required metrics, the platform adds them automatically.


Policy Inheritance

Policies can be set at two levels:

  1. Organization default — Applies to all agents unless overridden
  2. Agent-specific — Overrides the organization default for a specific agent

Best Practices

  1. Start conservative — Begin with low auto-approve thresholds and increase as the agent proves reliable.
  2. Monitor escalation volume — If too many outputs are escalated, reviewers get overwhelmed. Adjust thresholds or improve the agent.
  3. Use field-based routing — Different output fields may need different domain expertise.
  4. Review policies quarterly — As agents improve, update thresholds to reduce unnecessary escalations.