Cohere → OpenObserve

Capture token usage, latency, and model metadata for every Cohere inference call. The Cohere SDK does not have a dedicated OTel instrumentor, so traces are created by wrapping calls in manual spans and extracting usage data from the response.

Prerequisites

Python 3.8+
An OpenObserve account (cloud or self-hosted)
Your OpenObserve organisation ID and Base64-encoded auth token
A Cohere API key

Installation

pip install openobserve-telemetry-sdk cohere python-dotenv

Configuration

Create a .env file in your project root:

OPENOBSERVE_URL=https://api.openobserve.ai/
OPENOBSERVE_ORG=your_org_id
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
COHERE_API_KEY=your-cohere-api-key

Instrumentation

Call openobserve_init() to set up the tracer provider, then wrap each Cohere call in a manual span and extract token counts from the response metadata.

from dotenv import load_dotenv
load_dotenv()

from openobserve import openobserve_init
openobserve_init()

import os
from opentelemetry import trace
import cohere

tracer = trace.get_tracer(__name__)
co = cohere.Client(api_key=os.environ["COHERE_API_KEY"])


def chat(message: str, model: str = "command-r-08-2024") -> str:
    with tracer.start_as_current_span("cohere.chat") as span:
        span.set_attribute("llm_model_name", model)
        span.set_attribute("input_value", message)
        response = co.chat(model=model, message=message)
        output = response.text
        span.set_attribute("output_value", output[:200])
        if hasattr(response, "meta") and response.meta and response.meta.tokens:
            span.set_attribute("llm_token_count_prompt", response.meta.tokens.input_tokens or 0)
            span.set_attribute("llm_token_count_completion", response.meta.tokens.output_tokens or 0)
        return output


result = chat("Explain distributed tracing in one sentence.")
print(result)

What Gets Captured

Each co.chat() call wrapped in a span produces the following attributes.

Attribute	Description
`llm_model_name`	Model used (e.g. `command-r-08-2024`)
`input_value`	Prompt sent to the model
`output_value`	First 200 characters of the response
`llm_token_count_prompt`	Input tokens consumed
`llm_token_count_completion`	Output tokens generated
`duration`	End-to-end span latency
`span_status`	`UNSET` on success, `ERROR` on failure

Viewing Traces

Log in to OpenObserve and navigate to Traces
Filter by operation name cohere.chat to find Cohere spans
Click any span to inspect token counts and the prompt and response

Cohere trace in OpenObserve

Next Steps

With Cohere instrumented, every chat call is recorded in OpenObserve. From here you can track token usage per model, monitor latency, and set alerts on error spans.