What is MCP server observability, and how is it different from monitoring an MCP client?

MCP server observability means instrumenting the server that exposes tools, resources, and prompts so every initialize, tools/list, and tools/call shows up as a trace and a metric. Monitoring an MCP client is the consumer side: it tracks how an agent calls servers. The producer side, covered here, captures what happens inside your server: tool latency, error rates, payload sizes, and the isError tool failures that never surface as JSON-RPC errors.

Which OpenTelemetry attributes should I set on MCP server spans?

Follow the OpenTelemetry MCP semantic conventions: mcp.method.name (required, for example tools/call), mcp.session.id, mcp.protocol.version, and mcp.resource.uri for resource operations. Tool identity uses genai.tool.name, not mcp.tool.name (which is not defined). The JSON-RPC id is jsonrpc.request.id. Set error.type and span status ERROR on failures. The conventions are still at Development stability, so pin the version you build against.

How do I propagate trace context from an MCP client to my server?

Per MCP SEP-414, the client injects W3C trace context into the request's meta object using the keys traceparent, tracestate, and baggage, written unprefixed. The server reads params.meta, extracts the context, and starts its span as a child of the client span. This works over both stdio and Streamable HTTP because the context travels inside the JSON-RPC message rather than in HTTP headers.

How do I get rate, error, and duration (RED) metrics per MCP tool?

Run the OpenTelemetry Collector with the spanmetrics connector. It reads your trace spans and emits a calls counter and a duration histogram, with errors captured as a status.code dimension on the call count rather than a separate metric. Add mcp.method.name and genai.tool.name as dimensions so the metrics break down per method and per tool, then export them to a backend like OpenObserve and chart p50/p95/p99 latency and error rate.

How do I trace a stdio MCP server that has no HTTP headers?

stdio servers have no HTTP layer, so header-based propagation does not apply. Carry trace context inside the MCP meta field instead (SEP-414). For export, send spans over OTLP to a local OpenTelemetry Collector or directly to your backend. Keep all logging on stderr: anything written to stdout corrupts the JSON-RPC message stream.

What is MCP server observability, and how is it different from monitoring an MCP client?

MCP server observability means instrumenting the server that exposes tools, resources, and prompts so every initialize, tools/list, and tools/call shows up as a trace and a metric. Monitoring an MCP client is the consumer side: it tracks how an agent calls servers. The producer side, covered here, captures what happens inside your server: tool latency, error rates, payload sizes, and the isError tool failures that never surface as JSON-RPC errors.

Which OpenTelemetry attributes should I set on MCP server spans?

Follow the OpenTelemetry MCP semantic conventions: mcp.method.name (required, for example tools/call), mcp.session.id, mcp.protocol.version, and mcp.resource.uri for resource operations. Tool identity uses genai.tool.name, not mcp.tool.name (which is not defined). The JSON-RPC id is jsonrpc.request.id. Set error.type and span status ERROR on failures. The conventions are still at Development stability, so pin the version you build against.

How do I propagate trace context from an MCP client to my server?

Per MCP SEP-414, the client injects W3C trace context into the request's meta object using the keys traceparent, tracestate, and baggage, written unprefixed. The server reads params.meta, extracts the context, and starts its span as a child of the client span. This works over both stdio and Streamable HTTP because the context travels inside the JSON-RPC message rather than in HTTP headers.

How do I get rate, error, and duration (RED) metrics per MCP tool?

Run the OpenTelemetry Collector with the spanmetrics connector. It reads your trace spans and emits a calls counter and a duration histogram, with errors captured as a status.code dimension on the call count rather than a separate metric. Add mcp.method.name and genai.tool.name as dimensions so the metrics break down per method and per tool, then export them to a backend like OpenObserve and chart p50/p95/p99 latency and error rate.

How do I trace a stdio MCP server that has no HTTP headers?

stdio servers have no HTTP layer, so header-based propagation does not apply. Carry trace context inside the MCP meta field instead (SEP-414). For export, send spans over OTLP to a local OpenTelemetry Collector or directly to your backend. Keep all logging on stderr: anything written to stdout corrupts the JSON-RPC message stream.

OpenTelemetry MCP Distributed Tracing LLM Observability AI Agents

MCP Server Observability: How to Trace, Monitor, and Debug Model Context Protocol Servers

Gorakhnath Yadav

June 25, 2026

22 min read

Don’t forget to share!

Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Table of Contents

MCP server observability architecture with OpenTelemetry and OpenObserve

TLDR

MCP server observability is the producer side of the Model Context Protocol: instrumenting the server that exposes tools so every initialize, tools/list, and tools/call becomes a trace and a metric. This is the companion to running observability MCP servers, which is the consumer side.
Most MCP server failures hide in places the JSON-RPC layer never reports: tool executions that return isError: true as a successful response, bloated tools/list payloads that waste LLM tokens, and transport timeouts on stdio.
OpenTelemetry now ships MCP semantic conventions. They are merged but at Development (experimental) stability, and they reuse gen_ai.* for tool identity. Use mcp.method.name and gen_ai.tool.name, not mcp.tool.name, which the convention does not define.
Trace context crosses from client to server through the MCP _meta field (SEP-414), not HTTP headers, so the same approach works on stdio and Streamable HTTP.
The spanmetrics connector turns your spans into per-tool RED metrics with zero extra instrumentation, and a small Docker Compose stack puts the whole pipeline on your laptop.

Why MCP servers are black boxes

When you run an agent against a Model Context Protocol server, you can usually see the client side. The host application logs which tools the model picked and what came back. The server side is where you are blind. A tool call that takes four seconds, a tools/list response that quietly grew to 80 tools, a downstream API that returns a 500 wrapped in a polite isError result: none of that shows up unless you put it there.

This is the gap between consuming MCP servers and producing them. The consumer-side guide to observability MCP servers covers connecting an assistant like Claude or Cursor to your logs, metrics, and traces so you can ask questions in natural language. This post is the other direction: you wrote an MCP server, you put it in front of an agent, and now you need to know what it is doing under load, why a tool is slow, and which call actually failed.

The reason this is harder than instrumenting a normal HTTP service is that MCP is a stateful, JSON-RPC protocol with its own lifecycle, its own error model, and two transports that behave very differently. You cannot bolt on a generic web-framework middleware and call it done. You have to instrument the protocol. The good news is that OpenTelemetry gives you everything you need, and there is now a draft semantic convention that tells you exactly which attributes to set.

The MCP request lifecycle, and where it breaks

MCP is built on JSON-RPC 2.0. Every message is a request, a response, or a notification, and the protocol defines a strict order in which they flow. To instrument a server well, you need to know which methods exist and where each one tends to fail.

MCP request lifecycle from initialize and notifications/initialized to tools/list and tools/call, carried over stdio or Streamable HTTP

Session initialization and capability negotiation

A session opens with an initialize request from the client, which carries the client's protocolVersion, its capabilities, and clientInfo. The server replies with its own protocolVersion, capabilities, and serverInfo. The client then sends a notifications/initialized notification, and only after that does normal operation begin. The current protocol revision is 2025-11-25, a calendar-versioned string the client and server have to agree on.

Failures here are subtle, and version negotiation in particular does not behave like an error path. If the server does not support the protocolVersion the client requested, it does not reject the request. Per the spec it replies with a successful initialize result carrying a version it does support, usually its latest, and the client is the one that decides to disconnect when it cannot speak that version. (The spec does show a -32602 "Unsupported protocol version" error as one possible path, but a normal mismatch is the successful counter-offer, not an error.) So a wave of incompatible clients produces no tool calls and, on the common path, no errors in your handlers either. If you do not trace initialize, those version-mismatch disconnects look like silence: just clients that never come back.

Tool discovery and registry bloat

Before calling anything, the client asks for the catalog with tools/list. The server returns each tool's name along with its description, full inputSchema, and optional fields such as title, outputSchema, and annotations. This response is paginated, but in practice many servers return everything at once.

Here is the part most monitoring misses: that entire list is loaded into the model's context by the host application on the turns where it matters. A registry that grows from 8 tools to 60, each with a verbose description and a deep JSON Schema, silently inflates token usage on every request and degrades the model's ability to pick the right tool. The tools/list call itself may be fast, so latency alone will not warn you. You have to measure the size of the response: tool count and serialized bytes. We come back to this in its own section, because response size deserves to be tracked as a first-class signal.

Tool invocation and the isError trap

Tool calls use the tools/call method, with params.name and params.arguments. The result is a CallToolResult: a content array of text, image, audio, or resource blocks (a resource link or an embedded resource), an optional structuredContent, and a boolean isError.

That isError field is the single most important thing to understand about MCP error handling. A tool that fails, a payment API that times out, a database that rejects a query, does not return a JSON-RPC error. It returns a perfectly successful JSON-RPC response with isError: true and the error text inside the content array. That design is deliberate: the failure is meant to be handed back to the model so it can self-correct. But it means any observability that only watches JSON-RPC-level errors will record every failed tool execution as a success. Your dashboards will read 100% healthy while half the tool calls are returning errors to the model. You must surface that failure in your telemetry: read result.isError where you get the wire result, or catch the raised exception in frameworks like FastMCP that signal failures that way, and set your span status from it.

JSON-RPC errors, by contrast, are reserved for protocol problems: an unknown tool, malformed parameters, an unhandled exception in the server. Both matter, and they mean different things.

Transports: stdio versus Streamable HTTP

MCP defines two standard transports, and timeouts and corruption happen in different ways on each.

On stdio, the client launches your server as a subprocess and exchanges newline-delimited JSON over stdin and stdout, where a single message must never contain an embedded newline. The one rule that breaks more stdio servers than anything else: your server must never write anything but valid MCP messages to stdout. A stray print(), a logging library that defaults to stdout, a dependency that prints a banner on import, any of these corrupts the JSON-RPC framing and kills the session. All logs go to stderr. This is also why your instrumentation must never export telemetry to stdout on a stdio server.

On Streamable HTTP (which replaced the older HTTP+SSE transport), the client POSTs JSON-RPC to a single endpoint and may hold open an SSE stream for server-initiated messages. Sessions are tracked with the MCP-Session-Id header, and every post-initialization request carries an MCP-Protocol-Version header. Failures show up as 400s for a missing session id or an invalid MCP-Protocol-Version, 404s for an expired session, and connections that drop mid-stream. The spec is explicit that a dropped SSE connection is not a cancellation: the client has to send an explicit notifications/cancelled.

Symptom	Layer	What to capture
Clients connect then vanish	Session init	`initialize` spans, protocol-version attribute, disconnect rate
Token usage creeps up every release	Tool discovery	`tools/list` tool count and serialized byte size
Dashboards green, model retrying constantly	Tool execution	`isError` rate per tool, span status from `isError`
Random hangs on local servers	stdio transport	request timeouts, subprocess exit, stderr capture
Intermittent 400/404 under load	Streamable HTTP	session id presence, protocol-version header, stream drops

What to measure: the three observability layers

It helps to think about an MCP server in three layers, from the wire up to the agent. Each layer has its own signals.

Three MCP observability layers, transport and protocol, tool execution, and agentic, with RED metrics feeding the tool execution layer

Transport and protocol layer. Handshake success rate, JSON-RPC error rates by code, active session count, request timeouts, and transport-level drops. This is your foundation. If the handshake is failing, nothing above it matters.

Tool execution layer. This is where RED metrics (rate, errors, duration) live, broken down per tool. Add payload sizes and the isError rate. A single slow or flapping tool can stall an entire agent run, so per-tool granularity is not optional.

Agentic layer. How tool behavior rolls up into the agent's experience: how many tool calls a task needed, how often the model retried, whether a slow tool dragged out total reasoning time. This is where MCP observability meets monitoring AI agents in production, and it is the layer that connects a backend regression to a degraded user experience.

For the tool execution layer, set latency targets and alert on them, but derive the numbers from your own baseline rather than a universal table: a tool that calls an external API has a very different floor than one that reads from memory, initialize and tools/list should be near-instant, and serving over Streamable HTTP adds network round-trip time on top. Capture p50/p95/p99 per method over a few days of normal traffic, then alert on regressions against that baseline.

For the protocol layer, the JSON-RPC error code tells you what to do.

Code	Meaning	What it usually tells you
`-32700`	Parse error	Malformed JSON reached the server; a framing or encoding bug
`-32600`	Invalid Request	Not a valid JSON-RPC object
`-32601`	Method not found	A JSON-RPC method the server does not implement (for example `resources/read` when you expose no resources). An unknown tool name usually comes back as `-32602` or an `isError` result instead, depending on the SDK
`-32602`	Invalid params	Bad arguments to a method or tool; many SDKs also return this when a `tools/call` names a tool you do not expose (a stale registry or a model hallucinating a tool name)
`-32603`	Internal error	An unhandled server-side exception; page someone
result with `isError: true`	Tool execution error	The tool ran and failed; invisible at the JSON-RPC layer, fed back to the model to self-correct

The OpenTelemetry semantic conventions for MCP

You do not have to invent your own span names and attributes. OpenTelemetry has MCP semantic conventions: they were first merged into the main semantic-conventions repository in January 2026, then moved to the dedicated semantic-conventions-genai repository, where the GenAI and MCP conventions are now maintained together. One caveat to set expectations: they are at Development stability, OpenTelemetry's most experimental tier. The names can still change, so pin the version you build against and expect to revisit it.

The conventions deliberately do not defer to the generic RPC conventions. An MCP span is its own thing, with this shape:

Span name: {mcp.method.name} {target}, where the target is the tool or prompt name when there is one. So a tool call is named tools/call get_weather, and a handshake is just initialize.
Span kind: CLIENT on the client side, SERVER on the server side.
Span status: ERROR whenever error.type is set.

The attributes you actually set fall into a few groups.

Attribute	Notes
`mcp.method.name`	Required on every MCP span, for example `tools/call` or `tools/list`
`mcp.session.id`	The session identifier, for correlation
`mcp.protocol.version`	For example `2025-11-25`
`mcp.resource.uri`	Required for resource operations
`gen_ai.tool.name`	The tool name; this is where tool identity lives
`gen_ai.operation.name`	`execute_tool` for tool calls
`jsonrpc.request.id`	The JSON-RPC request id
`error.type`	Set on failure; drives span status

The trap to avoid: mcp.tool.name does not exist in the convention. Tool identity comes from gen_ai.tool.name, because MCP reuses the GenAI conventions for anything tool-shaped. Watch out for mcp.tool.name, mcp.request.id, and rpc.system="jsonrpc": none of those are in the spec, and if you copy them your spans will not line up with anyone else's tooling. The mcp.* namespace defines exactly four attributes; tool identity lives in gen_ai.*, and JSON-RPC specifics live in jsonrpc.*, with one exception: the JSON-RPC error code uses the generic rpc.response.status_code.

Instrument a Python MCP server (FastMCP middleware)

FastMCP, the high-level Python framework whose 1.0 release was folded into the official MCP Python SDK, has a middleware system that is the cleanest place to hang instrumentation. One install note: this middleware lives in the standalone FastMCP 2.x package (pip install fastmcp), not in the FastMCP bundled with the official mcp SDK. You subclass Middleware, override the hook for the operation you care about, and register it once. The on_call_tool hook wraps the full tool execution and hands you the tool name and arguments going in. One FastMCP detail to know: in middleware a failing tool surfaces as a raised exception rather than an isError flag (that flag lives on the wire-level CallToolResult the model eventually sees), so the hook catches exceptions to mark the span.

First, the OpenTelemetry setup. Export OTLP over HTTP via the standard OTEL_EXPORTER_OTLP_ENDPOINT environment variable, so the same code points at a local Collector or straight at OpenObserve. Read more about the moving parts in the OTLP exporters guide.

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "weather-mcp-server"})
provider = TracerProvider(resource=resource)
# With no arguments the exporter reads OTEL_EXPORTER_OTLP_ENDPOINT and appends
# /v1/traces. In the local stack below that points at the Collector, which
# forwards to OpenObserve. To skip the Collector and export straight to
# OpenObserve, pass the full path and auth instead:
#   OTLPSpanExporter(
#       endpoint="http://localhost:5080/api/default/v1/traces",
#       headers={"Authorization": "Basic <base64(email:password)>"},
#   )
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("mcp.server")

Now the middleware. It extracts any upstream trace context from _meta, opens a SERVER span named per the conventions, sets the MCP attributes, and marks the span ERROR if the tool raises.

from fastmcp.server.middleware import Middleware, MiddlewareContext
from opentelemetry import propagate
from opentelemetry.trace import SpanKind, Status, StatusCode

class OTelMiddleware(Middleware):
    async def on_call_tool(self, context: MiddlewareContext, call_next):
        tool_name = context.message.name

        # Continue the client's trace: traceparent lives in _meta (SEP-414).
        # context.message.meta is a pydantic Meta object, so dump it to a
        # plain dict before the propagator can read the carrier keys.
        raw_meta = getattr(context.message, "meta", None)
        meta = raw_meta.model_dump(by_alias=True) if raw_meta is not None else {}
        parent_ctx = propagate.extract(meta)

        with tracer.start_as_current_span(
            f"tools/call {tool_name}",
            kind=SpanKind.SERVER,
            context=parent_ctx,
        ) as span:
            span.set_attribute("mcp.method.name", "tools/call")
            span.set_attribute("gen_ai.tool.name", tool_name)
            span.set_attribute("gen_ai.operation.name", "execute_tool")
            # The isError trap: in FastMCP middleware a failed tool raises
            # rather than returning isError, so catch it to mark the span.
            try:
                result = await call_next(context)
            except Exception as exc:
                span.set_status(Status(StatusCode.ERROR))
                span.set_attribute("error.type", exc.__class__.__name__)
                span.record_exception(exc)
                raise
            return result

from fastmcp import FastMCP

mcp = FastMCP("weather-mcp-server")
mcp.add_middleware(OTelMiddleware())

The same pattern extends to the other hooks. Override on_list_tools to capture tools/list size, on_read_resource to set mcp.resource.uri, and on_message if you want a span on every inbound message. FastMCP ships built-in timing, logging, and error-handling middleware you can read for reference.

Instrument a TypeScript MCP server

The TypeScript SDK has no general middleware chain, so you instrument by wrapping the handler you register. Every handler receives a second argument, RequestHandlerExtra, which carries sessionId, requestId, and the incoming _meta. That _meta is where you read the upstream traceparent.

import {
  trace,
  context,
  propagation,
  SpanKind,
  SpanStatusCode,
} from "@opentelemetry/api";

const tracer = trace.getTracer("mcp.server");

server.registerTool("get_weather", toolConfig, async (args, extra) => {
  // traceparent travels in _meta, not headers (works on stdio too).
  const parentCtx = propagation.extract(context.active(), extra._meta ?? {});

  return tracer.startActiveSpan(
    "tools/call get_weather",
    { kind: SpanKind.SERVER },
    parentCtx,
    async (span) => {
      span.setAttribute("mcp.method.name", "tools/call");
      span.setAttribute("gen_ai.tool.name", "get_weather");
      span.setAttribute("mcp.session.id", extra.sessionId ?? "");
      span.setAttribute("jsonrpc.request.id", String(extra.requestId));
      try {
        const result = await getWeather(args);
        if (result.isError) {
          span.setStatus({ code: SpanStatusCode.ERROR });
          span.setAttribute("error.type", "tool_execution_error");
        }
        return result;
      } catch (err) {
        span.setStatus({ code: SpanStatusCode.ERROR });
        span.recordException(err as Error);
        throw err;
      } finally {
        span.end();
      }
    },
  );
});

One Node-specific setup detail: propagation.extract uses the globally registered propagator, which defaults to a no-op. Initialize the OpenTelemetry Node SDK (it registers the W3C trace context propagator by default) or call propagation.setGlobalPropagator(new W3CTraceContextPropagator()) yourself. Without it the extract call silently returns an empty context and your server span starts a brand-new trace instead of joining the client's.

If you register many tools, wrap this in a small withToolSpan(name, handler) helper so the instrumentation lives in one place instead of being copy-pasted into every tool. For lower-level servers, wrap the setRequestHandler(CallToolRequestSchema, ...) callback instead and read the same fields off extra.

Propagate trace context from client to server with _meta

A span on your server is useful. A span on your server that is a child of the client's span, joined into one trace that runs from the agent through the MCP call to your downstream API, is what actually lets you debug. The question is how the trace id crosses the boundary.

For a normal HTTP service you would put W3C traceparent in a request header. MCP is different, and the answer is now standardized. SEP-414, which reached Final status, documents that trace context travels in the MCP _meta field using the keys traceparent, tracestate, and baggage. The important detail: these keys are written unprefixed, as a documented exception to the usual reverse-DNS naming rule for _meta keys, specifically so they stay compatible with existing OpenTelemetry tooling.

Trace context propagation from MCP client to MCP server through the _meta traceparent field

A tools/call request with context attached looks like this:

{
  "jsonrpc": "2.0",
  "id": 7,
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": { "location": "New York" },
    "_meta": {
      "traceparent": "00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01"
    }
  }
}

The traceparent value follows the W3C format {version}-{trace-id}-{parent-id}-{flags}, where the trailing 01 means the trace is sampled. On the server, the propagate.extract(meta) call in the Python example and propagation.extract(context.active(), extra._meta) in the TypeScript example pull that context out and make your span a child of it.

Why _meta instead of HTTP headers? Two reasons, and both are the point of the design. First, stdio has no headers at all, so a header-based scheme would only ever work on one transport. Second, even over Streamable HTTP a single long-lived SSE connection can carry many tool calls; a header rides with the transport, while _meta travels in the message, giving every individual tools/call its own parent span. Putting context in the message instead of the transport is what makes one approach work everywhere.

Turn traces into RED metrics with the spanmetrics connector

You now have spans. You could write separate metric instruments by hand, but there is a better way: let the OpenTelemetry Collector derive the metrics from the spans you already emit. The spanmetrics connector reads every span on a traces pipeline and emits the RED metrics as two series: a calls counter and a duration histogram. Errors are not a separate metric; they are captured as a status.code dimension on the call count, so you read the error rate by filtering calls to status.code=Error. The connector then feeds both into a metrics pipeline.

The spanmetrics connector deriving calls and duration metrics from spans and exporting them to OpenObserve

The key move is adding mcp.method.name and gen_ai.tool.name as custom dimensions, so the metrics break down per method and per tool rather than collapsing into one number.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

connectors:
  span_metrics:
    namespace: mcp.server
    histogram:
      unit: ms
      explicit:
        buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
    dimensions:
      - name: mcp.method.name
      - name: gen_ai.tool.name
      - name: error.type

exporters:
  otlphttp/openobserve:
    endpoint: http://openobserve:5080/api/default
    headers:
      Authorization: "Basic <base64(email:password)>"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp/openobserve, span_metrics]
    metrics:
      receivers: [span_metrics]
      exporters: [otlphttp/openobserve]

The connector sits as an exporter on the traces pipeline and a receiver on the metrics pipeline. Note the component name: recent Collector builds use span_metrics (snake_case), while older builds use the now-deprecated spanmetrics, so match whichever your distribution expects. With this in place you get p50/p95/p99 latency and error rate per tool without touching your server code again.

Instrument tools/list latency and registry bloat

tools/list is cheap to serve and easy to ignore, but its response size is a real cost: every tool name, description, and input schema in that response is loaded into the model's context by the host application, so a fat registry burns tokens on every turn and makes tool selection worse.

Instrument the discovery call to capture both its latency and its size. In FastMCP:

import json

class ToolsListMiddleware(Middleware):
    async def on_list_tools(self, context: MiddlewareContext, call_next):
        with tracer.start_as_current_span(
            "tools/list", kind=SpanKind.SERVER
        ) as span:
            span.set_attribute("mcp.method.name", "tools/list")
            tools = await call_next(context)
            # FastMCP's on_list_tools hands you its internal Tool objects;
            # convert to the wire type so the byte count matches what the
            # client actually receives over JSON-RPC.
            wire = [
                t.to_mcp_tool().model_dump(
                    mode="json", by_alias=True, exclude_none=True
                )
                for t in tools
            ]
            span.set_attribute("mcpserver.tools.count", len(tools))
            span.set_attribute(
                "mcpserver.tools.serialized_bytes", len(json.dumps(wire))
            )
            return tools

mcpserver.tools.count and mcpserver.tools.serialized_bytes are not in the semantic convention, so they live under a namespace you own rather than mcp.*, which the spec controls. The to_mcp_tool() conversion matters too: FastMCP's on_list_tools hands you its internal Tool objects, so you serialize the converted wire form to measure the bytes the client actually receives; calling model_dump() on the raw FastMCP objects both misses the wire shape and raises on non-JSON fields. Chart the serialized byte size over time and alert when it crosses a threshold. A registry that doubles after a release is a regression you want to catch before it shows up as a token bill or a drop in tool-selection accuracy. Watch the first tools/list after a cold start too, since lazy tool registration can make discovery slow exactly when a new client connects.

Sampling, PII redaction, and cardinality control

Three operational concerns separate a demo from something you run in production.

Sampling. Tracing every single tools/call at high volume is expensive and rarely necessary. Use head-based ratio sampling for a fixed percentage when you just need trends, and tail-based sampling in the Collector when you want to keep all the errors and slow calls while dropping the boring fast ones. Tail sampling is the better fit for MCP, because the traces you care about most are the failures and the long-tail latencies.

PII redaction. Tool arguments routinely contain exactly what you do not want in your telemetry: API keys, account numbers, raw user prompts. This is why the OpenTelemetry conventions make gen_ai.tool.call.arguments opt-in rather than default. If you record arguments, redact them first. Strip or hash sensitive fields in your middleware before they ever touch a span attribute, and never log full request bodies on a stdio server where they would also land in stderr.

Cardinality control. Every distinct value of a span attribute that becomes a metric dimension multiplies your time series. mcp.method.name is safe: there are only about thirty methods in the whole spec. gen_ai.tool.name is usually safe. Raw tool arguments, user ids, and session ids are not: put them on spans, never on metric dimensions. This is the producer-side version of deciding what not to monitor, and getting it wrong is how an observability bill quietly triples.

Put it together: a local Docker Compose stack

You can run the entire pipeline on your laptop: an instrumented MCP server, an OpenTelemetry Collector with the spanmetrics connector, and OpenObserve as the backend. Save the Collector config above as otel-collector.yaml, then use this docker-compose.yml.

services:
  openobserve:
    image: o2cr.ai/openobserve/openobserve-enterprise:latest
    environment:
      ZO_ROOT_USER_EMAIL: root@example.com
      ZO_ROOT_USER_PASSWORD: Complexpass#123
      ZO_DATA_DIR: /data
    ports:
      - "5080:5080"
    volumes:
      - o2-data:/data

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317:4317"
      - "4318:4318"
    depends_on:
      - openobserve

  mcp-server:
    build: ./server
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
      OTEL_SERVICE_NAME: weather-mcp-server
    depends_on:
      - otel-collector

volumes:
  o2-data:

Point your server's OTLP exporter at the Collector (http://otel-collector:4318) instead of directly at OpenObserve, so the spanmetrics connector can do its work in between. Bring it up with one command:

docker compose up

Open OpenObserve at http://localhost:5080, drive a few tool calls through your server, and you will see the traces arrive. A tools/call trace shows the server span with its child spans for any downstream work, and the per-tool RED metrics from the connector show up alongside. Correlating traces, metrics, and logs in one place is the whole reason to send them to the same backend, a point the logs, traces, and metrics correlation guide makes at length.

End-to-end MCP observability architecture: agent, MCP server with OTel middleware, OpenTelemetry Collector with spanmetrics, and OpenObserve

OpenObserve trace waterfall for a single tools/call, showing the SERVER span and its downstream child span

Per-tool RED metrics in OpenObserve built from the spanmetrics output: p95 latency and error rate by gen_ai.tool.name

Send MCP telemetry to OpenObserve

The pipeline here exports plain OTLP, so your MCP traces and the RED metrics from the spanmetrics connector land in one place; add a logs pipeline to the Collector and your server logs join them. From there you can open a tools/call trace, follow its child spans into the downstream API that was slow, and pivot to the per-tool latency chart without changing tools. Nothing in this setup is OpenObserve-specific: the same Collector and OTLP ingestion path points at any OTLP backend, so you are not locked in. If you want the hosted version, start with OpenObserve Cloud and send your first MCP trace in a few minutes.

Frequently Asked Questions

: MCP server observability means instrumenting the server that exposes tools, resources, and prompts so every initialize, tools/list, and tools/call shows up as a trace and a metric. Monitoring an MCP client is the consumer side: it tracks how an agent calls servers. The producer side, covered here, captures what happens inside your server: tool latency, error rates, payload sizes, and the isError tool failures that never surface as JSON-RPC errors.
: Follow the OpenTelemetry MCP semantic conventions: mcp.method.name (required, for example tools/call), mcp.session.id, mcp.protocol.version, and mcp.resource.uri for resource operations. Tool identity uses gen_ai.tool.name, not mcp.tool.name (which is not defined). The JSON-RPC id is jsonrpc.request.id. Set error.type and span status ERROR on failures. The conventions are still at Development stability, so pin the version you build against.
: Per MCP SEP-414, the client injects W3C trace context into the request's _meta object using the keys traceparent, tracestate, and baggage, written unprefixed. The server reads params._meta, extracts the context, and starts its span as a child of the client span. This works over both stdio and Streamable HTTP because the context travels inside the JSON-RPC message rather than in HTTP headers.
: Run the OpenTelemetry Collector with the spanmetrics connector. It reads your trace spans and emits a calls counter and a duration histogram, with errors captured as a status.code dimension on the call count rather than a separate metric. Add mcp.method.name and gen_ai.tool.name as dimensions so the metrics break down per method and per tool, then export them to a backend like OpenObserve and chart p50/p95/p99 latency and error rate.
: stdio servers have no HTTP layer, so header-based propagation does not apply. Carry trace context inside the MCP _meta field instead (SEP-414). For export, send spans over OTLP to a local OpenTelemetry Collector or directly to your backend. Keep all logging on stderr: anything written to stdout corrupts the JSON-RPC message stream.

About the Author

Gorakhnath Yadav

Gorakhnath is a passionate developer advocate, working on bridging the gap between developers and the tools they use. He focuses on building communities and creating content that empowers developers to build better software.

Latest From Our Blogs

View all posts

Instrumenting CrewAI Multi-Agent Workflows with OpenTelemetry

How To

CrewAIOpenTelemetryObservability

Instrumenting CrewAI Multi-Agent Workflows with OpenTelemetry

Add real observability to CrewAI: map Crew, Agent, and Task objects to OpenTelemetry spans, tell CrewAI's own anonymous telemetry apart from your own tracing, and send the full multi-agent trace to OpenObserve.

Simran Kumari

2026-07-16

How To

MigrationHeliconeOpenObserve

How to Migrate from Helicone to OpenObserve

Helicone entered maintenance mode after Mintlify's March 2026 acquisition, with new signups closed and the roadmap frozen. Here's how to move LLM observability off Helicone's proxy and onto OpenObserve: replace the base-URL proxy with OpenTelemetry instrumentation, map Properties, Users, and Sessions to gen_ai attributes, and get infra correlation in the same backend.

We Built OpenObserve for Speed. Then We Fixed the UX.

We optimized OpenObserve for speed and cost and let the UI take a backseat. You told us. Here is what we changed, and why we are not done.

Ashish Kolhe

2026-07-14

Pin a Dashboard to Your OpenObserve Home Page (Org-Wide)

How To

DashboardsObservabilityOpenObserve

Pin a Dashboard to Your OpenObserve Home Page (Org-Wide)

You asked, we shipped: make one dashboard the org-wide landing view in OpenObserve. Pin it from the dashboard list or the dashboard header, and everyone on the team sees the same Home tab, server-side and across devices.

Ashish Kolhe

2026-07-13

Tracing a Runaway LLM Token Spike From Session to Trace to RUM

Engineering

LLM ObservabilityOpenTelemetryDistributed Tracing

Tracing a Runaway LLM Token Spike From Session to Trace to RUM

How an AI-governance engineer walks one anomalous LLM turn across three signals in OpenObserve — session, distributed trace, and RUM replay — to pin down cost, cause, and the human action behind a token spike.

Ashish Kolhe

2026-07-13

Instrumenting the OpenAI Agents SDK with OpenTelemetry

How To

OpenAI Agents SDKOpenTelemetryObservability

Instrumenting the OpenAI Agents SDK with OpenTelemetry

Trace the OpenAI Agents SDK with OpenTelemetry: map handoffs, guardrails, and agent spans to OTLP and send the full trace to OpenObserve, not OpenAI's backend.

Gorakhnath Yadav

2026-07-10

Observability Cost Optimization: 12 Tactics That Actually Work

Engineering

ObservabilityCostLogging

Observability Cost Optimization: 12 Tactics That Actually Work

Twelve config-level tactics for observability cost optimization, sampling, pipeline filtering, retention tiers, and cardinality control, with before/after numbers and real config examples for logs, metrics, and traces.

Simran Kumari

2026-07-10

OpenObserve vs Langfuse: Unified Observability vs LLM-Specific Platform (2026)

Engineering

ComparisonsLangfuseOpenObserve

OpenObserve vs Langfuse: Unified Observability vs LLM-Specific Platform (2026)

OpenObserve vs Langfuse in 2026: unified infra+LLM observability vs a dedicated LLM platform. Feature matrix, pricing, and when to use each (or both).

Gorakhnath Yadav

2026-07-10

Engineering

LoggingComparisonsObservability

Best Log Visualization Tools in 2026

Compare the best log visualization tools in 2026: OpenObserve, Kibana, Grafana Loki, Datadog, and Splunk. Covers AI-assisted analysis, dashboard quality, and cost.

Manas Sharma

2026-07-07

Top 10 Datadog Competitors in 2026: In-Depth Comparison for DevOps & SRE Teams

Engineering

ComparisonsObservabilityMonitoring

Top 10 Datadog Competitors in 2026: In-Depth Comparison for DevOps & SRE Teams

Compare the top 10 Datadog competitors in 2026: OpenObserve, Grafana, New Relic, Dynatrace, and Splunk. Pricing breakdowns, feature tables, and migration guidance for DevOps and SRE teams.

Simran Kumari

2026-07-07