Skip to content

LiteLLM Proxy → OpenObserve

Automatically capture token usage, latency, and model metadata for every request routed through the LiteLLM Proxy. The proxy serves an OpenAI-compatible API, so instrumentation uses the standard OpenAI instrumentor pointed at your local proxy.

Prerequisites

  • Python 3.8+
  • Docker (to run the LiteLLM Proxy)
  • An OpenObserve account (cloud or self-hosted)
  • Your OpenObserve organisation ID and Base64-encoded auth token
  • An OpenAI API key (or any provider key supported by LiteLLM)

Installation

pip install openobserve-telemetry-sdk openinference-instrumentation-openai openai python-dotenv

Configuration

Start the LiteLLM proxy with Docker:

docker run -d --name litellm-proxy \
  -p 4000:4000 \
  -e OPENAI_API_KEY=your-openai-key \
  ghcr.io/berriai/litellm:main-latest \
  --model gpt-4o-mini \
  --drop_params

Create a .env file in your project root:

OPENOBSERVE_URL=https://api.openobserve.ai/
OPENOBSERVE_ORG=your_org_id
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
LITELLM_PROXY_URL=http://localhost:4000

Instrumentation

Call OpenAIInstrumentor().instrument() before creating the OpenAI client. Point the client at your LiteLLM proxy.

from dotenv import load_dotenv
load_dotenv()

from openinference.instrumentation.openai import OpenAIInstrumentor
from openobserve import openobserve_init

OpenAIInstrumentor().instrument()
openobserve_init()

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("LITELLM_PROXY_API_KEY", "not-needed"),
    base_url=os.environ.get("LITELLM_PROXY_URL", "http://localhost:4000"),
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain distributed tracing in one sentence."}],
)
print(response.choices[0].message.content)

What Gets Captured

Attribute Description
llm_model_name Model routed by the proxy (e.g. gpt-4o-mini)
gen_ai_response_model Model that served the response
llm_system openai (the client library used)
llm_token_count_prompt Tokens in the prompt
llm_token_count_completion Tokens in the response
llm_token_count_total Total tokens consumed
llm_token_count_prompt_details_cache_read Tokens served from prompt cache
llm_token_count_completion_details_reasoning Reasoning tokens (if applicable)
llm_request_parameters_model Model name from request parameters
llm_invocation_parameters Full request parameters as JSON
llm_observation_type GENERATION
openinference_span_kind LLM
operation_name ChatCompletion
input_mime_type application/json
output_mime_type application/json
duration End-to-end request latency including proxy overhead
span_status OK or error status

Viewing Traces

  1. Log in to OpenObserve and navigate to Traces
  2. Filter by operation_name = ChatCompletion to find proxy spans
  3. Click any span to inspect token counts, model, and latency
  4. Use llm_model_name to identify which model the proxy routed to

LiteLLM Proxy trace in OpenObserve

Next Steps

With LiteLLM Proxy instrumented, every proxied request is recorded in OpenObserve. From here you can compare latency and token usage across models routed by the proxy, monitor fallback behaviour, and set alerts on error rates.

Read More