LiteLLM Proxy → OpenObserve
Automatically capture token usage, latency, and model metadata for every request routed through the LiteLLM Proxy. The proxy serves an OpenAI-compatible API, so instrumentation uses the standard OpenAI instrumentor pointed at your local proxy.
Prerequisites
- Python 3.8+
- Docker (to run the LiteLLM Proxy)
- An OpenObserve account (cloud or self-hosted)
- Your OpenObserve organisation ID and Base64-encoded auth token
- An OpenAI API key (or any provider key supported by LiteLLM)
Installation
Configuration
Start the LiteLLM proxy with Docker:
docker run -d --name litellm-proxy \
-p 4000:4000 \
-e OPENAI_API_KEY=your-openai-key \
ghcr.io/berriai/litellm:main-latest \
--model gpt-4o-mini \
--drop_params
Create a .env file in your project root:
OPENOBSERVE_URL=https://api.openobserve.ai/
OPENOBSERVE_ORG=your_org_id
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
LITELLM_PROXY_URL=http://localhost:4000
Instrumentation
Call OpenAIInstrumentor().instrument() before creating the OpenAI client. Point the client at your LiteLLM proxy.
from dotenv import load_dotenv
load_dotenv()
from openinference.instrumentation.openai import OpenAIInstrumentor
from openobserve import openobserve_init
OpenAIInstrumentor().instrument()
openobserve_init()
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("LITELLM_PROXY_API_KEY", "not-needed"),
base_url=os.environ.get("LITELLM_PROXY_URL", "http://localhost:4000"),
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain distributed tracing in one sentence."}],
)
print(response.choices[0].message.content)
What Gets Captured
| Attribute | Description |
|---|---|
llm_model_name |
Model routed by the proxy (e.g. gpt-4o-mini) |
gen_ai_response_model |
Model that served the response |
llm_system |
openai (the client library used) |
llm_token_count_prompt |
Tokens in the prompt |
llm_token_count_completion |
Tokens in the response |
llm_token_count_total |
Total tokens consumed |
llm_token_count_prompt_details_cache_read |
Tokens served from prompt cache |
llm_token_count_completion_details_reasoning |
Reasoning tokens (if applicable) |
llm_request_parameters_model |
Model name from request parameters |
llm_invocation_parameters |
Full request parameters as JSON |
llm_observation_type |
GENERATION |
openinference_span_kind |
LLM |
operation_name |
ChatCompletion |
input_mime_type |
application/json |
output_mime_type |
application/json |
duration |
End-to-end request latency including proxy overhead |
span_status |
OK or error status |
Viewing Traces
- Log in to OpenObserve and navigate to Traces
- Filter by
operation_name = ChatCompletionto find proxy spans - Click any span to inspect token counts, model, and latency
- Use
llm_model_nameto identify which model the proxy routed to

Next Steps
With LiteLLM Proxy instrumented, every proxied request is recorded in OpenObserve. From here you can compare latency and token usage across models routed by the proxy, monitor fallback behaviour, and set alerts on error rates.