Skip to content

Protect

Real-time guardrails for AI agent outputs.

The Protect pillar ensures that every AI agent output is assessed for risk before being served to end users. Low-risk outputs are auto-approved; high-risk outputs are escalated for human review.


How It Works

graph TD
    A[Agent produces output] --> B[evaluate]
    B -->|risk < threshold| C[Auto-approved]
    B -->|risk >= threshold| D[Escalated]
    B -->|risk > reject_threshold| E[Rejected]
    D --> F[Human reviewer]
    F -->|approved| G[Serve output]
    F -->|rejected| H[Fallback]
    F -->|corrected| I[Serve corrected output]
    I --> J[Metrics computed]
    J --> K[Health score updated]

Key Features

Feature Description
Risk-based evaluation Automated risk scoring based on agent health score
Three-outcome decisions Auto-approve, escalate, or reject
Human-in-the-loop Route high-risk outputs to domain experts
Quality metrics Computed from human corrections (F1, semantic similarity, etc.)
Feedback loop Corrections improve the agent's health score over time
Full audit trail Every evaluation and resolution is stored for compliance

The Evaluate-Resolve Loop

1. Evaluate

Submit agent output for risk assessment:

decision = coalex.evaluate(
    request_id="req-123",
    input={"question": "What medication is recommended?"},
    output={"answer": "Take 500mg of ibuprofen twice daily."},
    metrics={"answer": ["f1", "semantic_similarity"]},
)

2. Handle the decision

if decision.status == "auto_approved":
    serve_response(output)
elif decision.status == "escalated":
    route_to_reviewer(decision.escalation_id)
elif decision.status == "rejected":
    serve_fallback()

3. Resolve (when escalated)

result = coalex.resolve(
    escalation_id=decision.escalation_id,
    decision="corrected",
    corrections={"answer": "Take 400mg of ibuprofen twice daily with food."},
    reviewer={"name": "Dr. Smith", "email": "dr.smith@hospital.org"},
    reason="Dosage adjusted and food requirement added.",
)

Sections

  • Evaluate — How evaluation works and risk scoring
  • Escalations — Managing escalated outputs
  • Resolve — Submitting human review decisions
  • Policies — Configuring risk thresholds and routing

SDK Reference