DeepSeek → OpenObserve
Automatically capture token usage, latency, and model metadata for every DeepSeek inference call in your Python application. DeepSeek exposes an OpenAI-compatible API, so instrumentation uses the standard OpenAI instrumentor pointed at the DeepSeek endpoint.
Prerequisites
- Python 3.8+
- An OpenObserve account (cloud or self-hosted)
- Your OpenObserve organisation ID and Base64-encoded auth token
- A DeepSeek API key
Installation
Configuration
Create a .env file in your project root:
OPENOBSERVE_URL=https://api.openobserve.ai/
OPENOBSERVE_ORG=your_org_id
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
DEEPSEEK_API_KEY=your-deepseek-api-key
Instrumentation
Call OpenAIInstrumentor().instrument() before creating the OpenAI client. Point the client at the DeepSeek base URL and pass your DeepSeek API key.
from dotenv import load_dotenv
load_dotenv()
from openinference.instrumentation.openai import OpenAIInstrumentor
from openobserve import openobserve_init
OpenAIInstrumentor().instrument()
openobserve_init()
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/v1",
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Explain distributed tracing in one sentence."}],
)
print(response.choices[0].message.content)
What Gets Captured
| Attribute | Description |
|---|---|
llm_provider |
deepseek |
llm_system |
openai (OpenAI-compatible client) |
llm_model_name |
Resolved model returned by the API (e.g. deepseek-v4-flash) |
llm_request_parameters_model |
Model name sent in the request (e.g. deepseek-chat) |
llm_request_parameters_max_tokens |
max_tokens value from the request |
gen_ai_response_model |
Same as llm_model_name |
llm_observation_type |
GENERATION |
llm_token_count_prompt |
Prompt tokens consumed |
llm_token_count_completion |
Completion tokens returned |
llm_token_count_total |
Total tokens consumed |
llm_token_count_prompt_details_cache_read |
Prompt cache read tokens |
llm_usage_tokens_input |
Input tokens (mirrors llm_token_count_prompt) |
llm_usage_tokens_output |
Output tokens (mirrors llm_token_count_completion) |
llm_usage_tokens_total |
Total tokens (mirrors llm_token_count_total) |
openinference_span_kind |
LLM |
operation_name |
ChatCompletion |
input_mime_type |
application/json |
output_mime_type |
application/json |
duration |
End-to-end request latency |
span_status |
OK on success, ERROR on failure |
Viewing Traces
- Log in to OpenObserve and navigate to Traces
- Spans appear with
operation_name: ChatCompletionandllm_provider: deepseek - Note that
deepseek-chatresolves to the actual model inllm_model_name(e.g.deepseek-v4-flash) - Filter by
llm_providerto isolate DeepSeek spans from other providers

Next Steps
With DeepSeek instrumented, every inference call is recorded in OpenObserve. From here you can monitor token usage, compare latency across model aliases, and set alerts on error spans.