What Monocle tracks
The dashboard has four tabs:- Overview: total calls, token usage, estimated costs, and error rates at a glance
- Models: per-model breakdown with latency, token consumption, and cost
- Tools: tool call stats with success/error rates and duration
- Conversations: multi-turn conversation flows with drill-down into individual runs
Token tracking
Monocle captures detailed token usage per LLM call, including provider-specific breakdowns when available:- Input and output tokens
- Cached input tokens (cache hits)
- Cache write tokens
- Reasoning tokens (for models like o1 and o3)
Conversation tracking
You can group multiple LLM calls into a single conversation by setting a conversation ID. This enables the Conversations tab, where you can follow the full flow of an agentic loop: which models were called, which tools were invoked, and how many tokens were consumed at each step.Trace viewer integration
When viewing a trace that contains AI spans, an AI tab appears in the span details sidebar. It shows token breakdowns, input/output previews, and model information for that specific call.Span attributes
All AI instrumentations must emit these attributes for the AI dashboard to work correctly.Standard GenAI semconv attributes
These follow the official OpenTelemetry GenAI Semantic Conventions:| Attribute | Description |
|---|---|
gen_ai.operation.name | Operation type (generate_text, stream_text, invoke_agent, execute_tool, etc.) |
gen_ai.system | LLM provider name |
gen_ai.request.model | Requested model ID |
gen_ai.response.model | Model ID used in response |
gen_ai.response.id | Provider response ID |
gen_ai.response.finish_reasons | Finish reasons array |
gen_ai.request.temperature | Temperature setting |
gen_ai.request.max_tokens | Max tokens setting |
gen_ai.request.top_p | Top-p setting |
Token usage attributes
| Attribute | Description |
|---|---|
gen_ai.usage.input_tokens | Input tokens consumed |
gen_ai.usage.output_tokens | Output tokens generated |
gen_ai.usage.total_tokens | Computed total (input + output) |
gen_ai.usage.input_tokens.cached | Cached/cache-hit input tokens |
gen_ai.usage.input_tokens.cache_write | Cache write tokens |
gen_ai.usage.output_tokens.reasoning | Reasoning tokens |
Tool call attributes
| Attribute | Description |
|---|---|
gen_ai.tool.name | Tool name |
gen_ai.tool.call.id | Tool call ID |
gen_ai.tool.type | Always "function" |
Monocle attributes
These custom attributes power specific dashboard features:| Attribute | Dashboard Feature |
|---|---|
gen_ai.conversation.id | Conversations tab, grouping multi-turn interactions |
gen_ai.response.model | Models tab, per-model cost and usage breakdown |
gen_ai.tool.name | Tools tab, per-tool call stats and error rates |
gen_ai.usage.*_tokens | Token consumption charts and cost estimation |
gen_ai.usage.input_tokens.cached | Cache hit rate display and cost savings calculation |
gen_ai.function_id | Pipeline name display |
ai.streaming | Distinguishing streaming vs non-streaming calls |
Supported SDKs
Vercel AI SDK
Auto-instruments generateText, streamText, generateObject, streamObject, embed, embedMany, and rerank