Observability Metrics¶

Coalex automatically captures operational metrics for every trace — token usage, cost estimates, latency, and environmental impact.

Not to be confused with quality metrics

This page covers observability metrics (tokens, cost, latency). For quality metrics used in evaluations (F1, semantic similarity, etc.), see the Metrics Catalog.

Token Metrics¶

Captured automatically by auto-instrumentation for every LLM call.

Metric	Unit	Description
`input_tokens`	count	Prompt/input token count
`output_tokens`	count	Completion/output token count
`total_tokens`	count	Sum of input + output tokens

Cost Metrics¶

Estimated by the Transformer based on model pricing tables.

Metric	Unit	Description
`cost_total`	USD	Estimated total cost for the LLM call
`cost_per_token`	USD	Average cost per token

Performance Metrics¶

Metric	Unit	Description
`latency`	ms	End-to-end span duration
`time_to_first_token`	ms	Time from request to first streamed token

Sustainability Metrics¶

Powered by the ecologits library. Computed by the Transformer for every LLM call.

Metric	Unit	Description
`energy`	kWh	Energy consumption
`gwp`	kgCO2eq	Global Warming Potential (carbon footprint)
`adpe`	kgSbeq	Abiotic Depletion Potential for Elements
`pe`	MJ	Primary Energy consumption

Viewing Metrics¶

Dashboard¶

The admin dashboard displays metrics on the Agent Detail page:

Token usage over time
Cost breakdown by model
Latency percentiles (P50, P95, P99)
Sustainability impact summary

API¶

Query metrics for a specific agent:

curl https://your-org.coalex.ai/v1/metrics?agent_id=support-bot \
  -H "Authorization: Bearer $COALEX_API_KEY"

SQL (DuckLake)¶

SELECT
    metric_type,
    metric_id,
    AVG(value) as avg_value,
    COUNT(*) as samples
FROM metrics
WHERE agent_id = 'support-bot'
  AND created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY metric_type, metric_id
ORDER BY metric_type, metric_id;

Best Practices¶

Monitor cost trends — Set up alerts when daily cost exceeds a threshold.
Track token efficiency — Compare input/output token ratios across prompt versions.
Use sustainability data — Report carbon footprint for EU AI Act compliance.
Benchmark latency — Use P95 latency as your SLA metric, not average.