Key Concepts¶
This page covers the core concepts you need to understand when integrating Coalex into your AI agent.
Agents¶
An agent is any AI-powered system that produces outputs for end users. In Coalex, agents are first-class entities with a unique agent_id, display name, and lifecycle status.
You register agents with declare_agent() so the dashboard recognizes them before traces arrive:
Each agent has its own health score, escalation history, and policy configuration.
Traces & Spans¶
A trace represents a single end-to-end request through your agent. It contains one or more spans — individual operations like LLM calls, retrieval queries, or tool invocations.
Coalex uses OpenTelemetry and OpenInference to capture spans automatically when you call auto_instrument().
Trace: claims-bot / req-456
├── coalex_context (ROOT)
│ ├── retrieve_documents (RETRIEVER)
│ ├── ChatOpenAI (LLM)
│ │ model: gpt-4o
│ │ tokens_in: 214, tokens_out: 143
│ └── parse_response (CHAIN)
└── evaluate (internal)
Use coalex_context() to create a root span that tags all child spans with the agent ID and request ID.
Evaluations¶
An evaluation is a risk assessment of your agent's output. Call evaluate() with the agent's input, output, and the metrics you want computed:
decision = coalex.evaluate(
request_id="req-456",
input={"question": "What is my deductible?"},
output={"answer": "Your deductible is $500."},
metrics={"answer": ["semantic_similarity", "f1"]},
)
The Coalex platform assigns a risk score (0.0 – 1.0) based on the agent's health score and returns one of three statuses:
| Status | Meaning |
|---|---|
auto_approved |
Low risk — safe to serve to the user |
escalated |
Medium/high risk — requires human review |
rejected |
High risk — do not serve to the user |
Escalations¶
When an evaluation returns status == "escalated", an escalation is created. Escalations represent agent outputs that need human review before being served to users.
Each escalation has:
- A unique
escalation_id - The original input and output
- A risk score
- A status (
pending→approved/rejected/corrected)
Route escalations to human reviewers through your own UI, Slack, email, or any notification system.
Resolutions¶
A resolution is the human reviewer's decision on an escalation. Call resolve() to submit the decision:
| Decision | Meaning |
|---|---|
approved |
The output is correct as-is |
rejected |
The output is incorrect — do not serve |
corrected |
The reviewer provides a corrected version |
When a reviewer submits corrections, Coalex computes quality metrics (F1, semantic similarity, etc.) by comparing the original output against the corrections. These metrics feed back into the agent's health score.
result = coalex.resolve(
escalation_id="esc-001",
decision="corrected",
corrections={"answer": "Your deductible is $1,000 for in-network providers."},
reviewer={"name": "Dr. Smith", "email": "dr.smith@hospital.org"},
reason="Incorrect deductible amount.",
)
Metrics¶
Metrics are quality scores computed when human reviewers provide corrections. They measure how close the agent's original output was to the corrected version.
| Metric | Description |
|---|---|
f1 |
Token-level precision and recall |
semantic_similarity |
Cosine similarity between embeddings |
exact_match |
Binary: did the output match exactly? |
rouge_l |
Longest common subsequence overlap |
bleu |
N-gram overlap (machine translation standard) |
word_overlap |
Fraction of expected words present |
contains |
Binary: is the expected text contained in the output? |
levenshtein |
Normalized edit distance |
Metrics are declared at evaluate-time and computed at resolve-time. See the Metrics Catalog for the full schema.
Policies¶
Policies define the rules that govern how evaluations are handled:
- Risk thresholds — what risk score triggers escalation vs. auto-approval
- Escalation routing — which reviewers receive which types of escalations
- Metric requirements — which metrics must be computed for each agent
Policies are configured per-agent in the dashboard. See Policies.
Health Score¶
The health score is a rolling measure of agent reliability (0.0 – 1.0). It is computed from:
- Historical evaluation outcomes (approval rate, rejection rate)
- Quality metric trends (improving or degrading)
- Escalation resolution patterns
A high health score means the agent is consistently producing correct outputs. A declining health score triggers more escalations and may change auto-approval thresholds.
The Evaluate-Resolve Loop¶
The core workflow in Coalex is the evaluate-resolve loop:
graph TD
A[Agent produces output] --> B[evaluate]
B -->|auto_approved| C[Serve to user]
B -->|escalated| D[Human reviews]
B -->|rejected| E[Fallback response]
D -->|approved| C
D -->|rejected| E
D -->|corrected| F[Serve corrected output]
F --> G[Metrics computed]
G --> H[Health score updated]
H --> B
Over time, as the health score improves, fewer outputs are escalated — your agent graduates from "pilot" to "production" with a full audit trail.
Next Steps¶
- Quickstart — Get started in 5 minutes
- Installation — Install the SDK
- SDK Reference — Full API documentation