Skip to content

Observability Metrics

Coalex automatically captures operational metrics for every trace — token usage, cost estimates, latency, and environmental impact.

Not to be confused with quality metrics

This page covers observability metrics (tokens, cost, latency). For quality metrics used in evaluations (F1, semantic similarity, etc.), see the Metrics Catalog.


Token Metrics

Captured automatically by auto-instrumentation for every LLM call.

Metric Unit Description
input_tokens count Prompt/input token count
output_tokens count Completion/output token count
total_tokens count Sum of input + output tokens

Cost Metrics

Estimated by the Transformer based on model pricing tables.

Metric Unit Description
cost_total USD Estimated total cost for the LLM call
cost_per_token USD Average cost per token

Performance Metrics

Metric Unit Description
latency ms End-to-end span duration
time_to_first_token ms Time from request to first streamed token

Sustainability Metrics

Powered by the ecologits library. Computed by the Transformer for every LLM call.

Metric Unit Description
energy kWh Energy consumption
gwp kgCO2eq Global Warming Potential (carbon footprint)
adpe kgSbeq Abiotic Depletion Potential for Elements
pe MJ Primary Energy consumption

Viewing Metrics

Dashboard

The admin dashboard displays metrics on the Agent Detail page:

  • Token usage over time
  • Cost breakdown by model
  • Latency percentiles (P50, P95, P99)
  • Sustainability impact summary

API

Query metrics for a specific agent:

curl https://your-org.coalex.ai/v1/metrics?agent_id=support-bot \
  -H "Authorization: Bearer $COALEX_API_KEY"

SQL (DuckLake)

SELECT
    metric_type,
    metric_id,
    AVG(value) as avg_value,
    COUNT(*) as samples
FROM metrics
WHERE agent_id = 'support-bot'
  AND created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY metric_type, metric_id
ORDER BY metric_type, metric_id;

Best Practices

  1. Monitor cost trends — Set up alerts when daily cost exceeds a threshold.
  2. Track token efficiency — Compare input/output token ratios across prompt versions.
  3. Use sustainability data — Report carbon footprint for EU AI Act compliance.
  4. Benchmark latency — Use P95 latency as your SLA metric, not average.