Ollama → OpenObserve

Automatically capture token usage, latency, and model metadata for every Ollama inference call in your Python application — no cloud API key required.

Prerequisites

Python 3.8+
Ollama running locally (default: http://localhost:11434)
An OpenObserve account (cloud or self-hosted)
Your OpenObserve organisation ID and Base64-encoded auth token

Pull a model before running the examples:

ollama pull llama3.2

Installation

pip install openobserve-telemetry-sdk opentelemetry-instrumentation-ollama ollama python-dotenv

Configuration

Create a .env file in your project root:

# OpenObserve instance URL
# Default for self-hosted: http://localhost:5080
OPENOBSERVE_URL=https://api.openobserve.ai/

# Your OpenObserve organisation slug or ID
OPENOBSERVE_ORG=your_org_id

# Basic auth token — Base64-encoded "email:password"
OPENOBSERVE_AUTH_TOKEN="Basic <your_base64_token>"

# Ollama base URL (change if Ollama is running on a different host)
OLLAMA_HOST=http://localhost:11434

Instrumentation

Call OllamaInstrumentor().instrument() before any Ollama client is created.

from opentelemetry.instrumentation.ollama import OllamaInstrumentor
from openobserve import openobserve_init

# Instrument before importing the Ollama client
OllamaInstrumentor().instrument()
openobserve_init()

import ollama

# Chat completion
response = ollama.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Explain distributed tracing in one sentence."}],
)
print(response["message"]["content"])

Streaming

stream = ollama.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Write a haiku about observability."}],
    stream=True,
)
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)

Using the OpenAI-compatible endpoint

If you use Ollama's OpenAI-compatible API (/v1/chat/completions), instrument it with the OpenAI instrumentor instead:

from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openobserve import openobserve_init
from openai import OpenAI

OpenAIInstrumentor().instrument()
openobserve_init()

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

What Gets Captured

Attribute	Description
`gen_ai_request_model`	Model name (e.g. `llama3.2`)
`gen_ai_usage_input_tokens`	Tokens in the prompt
`gen_ai_usage_output_tokens`	Tokens in the response
`llm_usage_tokens_total`	Total tokens consumed
`llm_usage_cost_input`	Estimated input cost in USD
`llm_usage_cost_output`	Estimated output cost in USD
`gen_ai_system`	`ollama`
`duration`	End-to-end request latency
`error`	Exception details if the request failed

Viewing Traces

Log in to OpenObserve and navigate to Traces
Click any span to inspect token counts and full request metadata
Use gen_ai_request_model to compare latency across different locally-hosted models

Next Steps

With Ollama instrumented, every local inference call is automatically recorded in OpenObserve — no cloud API key required. From here you can compare token throughput across models, monitor latency for different prompt sizes, and benchmark locally-hosted models side by side.