Policies¶

Policies define the rules that govern how evaluations are handled for each agent. Configure them in the admin dashboard under Settings > Policies.

Risk Thresholds¶

Each agent has two configurable thresholds:

Threshold	Default	Description
Auto-approve	0.3	Outputs with risk score below this are auto-approved
Reject	0.9	Outputs with risk score above this are rejected

Outputs between the two thresholds are escalated for human review.

Risk Score:  0.0 ─────── 0.3 ─────── 0.9 ─────── 1.0
             │ auto_approved │  escalated  │  rejected  │

Adjusting Thresholds¶

Thresholds should reflect your risk tolerance:

Use Case	Auto-approve	Reject	Notes
High-risk medical	0.1	0.7	Most outputs escalated for review
General support	0.5	0.95	Most outputs auto-approved
Internal tools	0.7	0.99	Minimal escalation

As an agent's health score improves over time, you can safely raise the auto-approve threshold.

Escalation Routing¶

Configure who receives escalations:

Default reviewers — All escalations for an agent go to a reviewer group
Field-based routing — Route specific output fields to specialized reviewers (e.g., icd_code to medical coding experts)
Risk-based routing — High-risk escalations go to senior reviewers

Routing is configured per-agent in the dashboard.

Metric Requirements¶

Define which metrics must be computed for each agent's evaluations:

{
    "agent_id": "claims-bot",
    "required_metrics": {
        "diagnosis": ["semantic_similarity", "f1"],
        "icd_code": ["exact_match"],
        "recommendation": ["semantic_similarity"]
    }
}

When evaluate() is called without the required metrics, the platform adds them automatically.

Policy Inheritance¶

Policies can be set at two levels:

Organization default — Applies to all agents unless overridden
Agent-specific — Overrides the organization default for a specific agent

Best Practices¶

Start conservative — Begin with low auto-approve thresholds and increase as the agent proves reliable.
Monitor escalation volume — If too many outputs are escalated, reviewers get overwhelmed. Adjust thresholds or improve the agent.
Use field-based routing — Different output fields may need different domain expertise.
Review policies quarterly — As agents improve, update thresholds to reduce unnecessary escalations.