Protect¶
Real-time guardrails for AI agent outputs.
The Protect pillar ensures that every AI agent output is assessed for risk before being served to end users. Low-risk outputs are auto-approved; high-risk outputs are escalated for human review.
How It Works¶
graph TD
A[Agent produces output] --> B[evaluate]
B -->|risk < threshold| C[Auto-approved]
B -->|risk >= threshold| D[Escalated]
B -->|risk > reject_threshold| E[Rejected]
D --> F[Human reviewer]
F -->|approved| G[Serve output]
F -->|rejected| H[Fallback]
F -->|corrected| I[Serve corrected output]
I --> J[Metrics computed]
J --> K[Health score updated]
Key Features¶
| Feature | Description |
|---|---|
| Risk-based evaluation | Automated risk scoring based on agent health score |
| Three-outcome decisions | Auto-approve, escalate, or reject |
| Human-in-the-loop | Route high-risk outputs to domain experts |
| Quality metrics | Computed from human corrections (F1, semantic similarity, etc.) |
| Feedback loop | Corrections improve the agent's health score over time |
| Full audit trail | Every evaluation and resolution is stored for compliance |
The Evaluate-Resolve Loop¶
1. Evaluate¶
Submit agent output for risk assessment:
decision = coalex.evaluate(
request_id="req-123",
input={"question": "What medication is recommended?"},
output={"answer": "Take 500mg of ibuprofen twice daily."},
metrics={"answer": ["f1", "semantic_similarity"]},
)
2. Handle the decision¶
if decision.status == "auto_approved":
serve_response(output)
elif decision.status == "escalated":
route_to_reviewer(decision.escalation_id)
elif decision.status == "rejected":
serve_fallback()
3. Resolve (when escalated)¶
result = coalex.resolve(
escalation_id=decision.escalation_id,
decision="corrected",
corrections={"answer": "Take 400mg of ibuprofen twice daily with food."},
reviewer={"name": "Dr. Smith", "email": "dr.smith@hospital.org"},
reason="Dosage adjusted and food requirement added.",
)
Sections¶
- Evaluate — How evaluation works and risk scoring
- Escalations — Managing escalated outputs
- Resolve — Submitting human review decisions
- Policies — Configuring risk thresholds and routing
SDK Reference¶
evaluate()— Submit output for evaluationresolve()— Submit human review decision