Key Concepts¶

This page covers the core concepts you need to understand when integrating Coalex into your AI agent.

Agents¶

An agent is any AI-powered system that produces outputs for end users. In Coalex, agents are first-class entities with a unique agent_id, display name, and lifecycle status.

You register agents with declare_agent() so the dashboard recognizes them before traces arrive:

coalex.declare_agent(agent_id="claims-bot", display_name="Claims Bot")

Each agent has its own health score, escalation history, and policy configuration.

Traces & Spans¶

A trace represents a single end-to-end request through your agent. It contains one or more spans — individual operations like LLM calls, retrieval queries, or tool invocations.

Coalex uses OpenTelemetry and OpenInference to capture spans automatically when you call auto_instrument().

Trace: claims-bot / req-456
  ├── coalex_context (ROOT)
  │   ├── retrieve_documents (RETRIEVER)
  │   ├── ChatOpenAI (LLM)
  │   │     model: gpt-4o
  │   │     tokens_in: 214, tokens_out: 143
  │   └── parse_response (CHAIN)
  └── evaluate (internal)

Use coalex_context() to create a root span that tags all child spans with the agent ID and request ID.

Evaluations¶

An evaluation is a risk assessment of your agent's output. Call evaluate() with the agent's input, output, and the metrics you want computed:

decision = coalex.evaluate(
    request_id="req-456",
    input={"question": "What is my deductible?"},
    output={"answer": "Your deductible is $500."},
    metrics={"answer": ["semantic_similarity", "f1"]},
)

The Coalex platform assigns a risk score (0.0 – 1.0) based on the agent's health score and returns one of three statuses:

Status	Meaning
`auto_approved`	Low risk — safe to serve to the user
`escalated`	Medium/high risk — requires human review
`rejected`	High risk — do not serve to the user

Escalations¶

When an evaluation returns status == "escalated", an escalation is created. Escalations represent agent outputs that need human review before being served to users.

Each escalation has:

A unique escalation_id
The original input and output
A risk score
A status (pending → approved / rejected / corrected)

Route escalations to human reviewers through your own UI, Slack, email, or any notification system.

Resolutions¶

A resolution is the human reviewer's decision on an escalation. Call resolve() to submit the decision:

Decision	Meaning
`approved`	The output is correct as-is
`rejected`	The output is incorrect — do not serve
`corrected`	The reviewer provides a corrected version

When a reviewer submits corrections, Coalex computes quality metrics (F1, semantic similarity, etc.) by comparing the original output against the corrections. These metrics feed back into the agent's health score.

result = coalex.resolve(
    escalation_id="esc-001",
    decision="corrected",
    corrections={"answer": "Your deductible is $1,000 for in-network providers."},
    reviewer={"name": "Dr. Smith", "email": "dr.smith@hospital.org"},
    reason="Incorrect deductible amount.",
)

Metrics¶

Metrics are quality scores computed when human reviewers provide corrections. They measure how close the agent's original output was to the corrected version.

Metric	Description
`f1`	Token-level precision and recall
`semantic_similarity`	Cosine similarity between embeddings
`exact_match`	Binary: did the output match exactly?
`rouge_l`	Longest common subsequence overlap
`bleu`	N-gram overlap (machine translation standard)
`word_overlap`	Fraction of expected words present
`contains`	Binary: is the expected text contained in the output?
`levenshtein`	Normalized edit distance

Metrics are declared at evaluate-time and computed at resolve-time. See the Metrics Catalog for the full schema.

Policies¶

Policies define the rules that govern how evaluations are handled:

Risk thresholds — what risk score triggers escalation vs. auto-approval
Escalation routing — which reviewers receive which types of escalations
Metric requirements — which metrics must be computed for each agent

Policies are configured per-agent in the dashboard. See Policies.

Health Score¶

The health score is a rolling measure of agent reliability (0.0 – 1.0). It is computed from:

Historical evaluation outcomes (approval rate, rejection rate)
Quality metric trends (improving or degrading)
Escalation resolution patterns

A high health score means the agent is consistently producing correct outputs. A declining health score triggers more escalations and may change auto-approval thresholds.

The Evaluate-Resolve Loop¶

The core workflow in Coalex is the evaluate-resolve loop:

graph TD
    A[Agent produces output] --> B[evaluate]
    B -->|auto_approved| C[Serve to user]
    B -->|escalated| D[Human reviews]
    B -->|rejected| E[Fallback response]
    D -->|approved| C
    D -->|rejected| E
    D -->|corrected| F[Serve corrected output]
    F --> G[Metrics computed]
    G --> H[Health score updated]
    H --> B

Over time, as the health score improves, fewer outputs are escalated — your agent graduates from "pilot" to "production" with a full audit trail.

Next Steps¶

Quickstart — Get started in 5 minutes
Installation — Install the SDK
SDK Reference — Full API documentation