Skip to content

Get Demo Star

Google Gemini → OpenObserve

Automatically capture token usage, latency, and model metadata for every Gemini API call in your Python application.

Prerequisites

Python 3.9+
An OpenObserve account (cloud or self-hosted)
Your OpenObserve organisation ID and Base64-encoded auth token
A Google AI API key (from Google AI Studio)

Installation

pip install openobserve-telemetry-sdk opentelemetry-instrumentation-google-generativeai google-genai python-dotenv

Configuration

Create a .env file in your project root:

# OpenObserve instance URL
# Default for self-hosted: http://localhost:5080
OPENOBSERVE_URL=https://api.openobserve.ai/

# Your OpenObserve organisation slug or ID
OPENOBSERVE_ORG=your_org_id

# Basic auth token — Base64-encoded "email:password"
OPENOBSERVE_AUTH_TOKEN="Basic <your_base64_token>"

# Google AI API key
GOOGLE_API_KEY=your-google-ai-key

Instrumentation

Call GoogleGenerativeAiInstrumentor().instrument() before any Gemini client is created.

from opentelemetry.instrumentation.google_generativeai import GoogleGenerativeAiInstrumentor
from openobserve import openobserve_init

# Instrument before importing the Gemini SDK
GoogleGenerativeAiInstrumentor().instrument()
openobserve_init()

from google import genai
import os

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

# Single-turn generation
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain observability in one sentence.",
)
print(response.text)

Multi-turn chat

chat = client.chats.create(model="gemini-2.5-flash")

response = chat.send_message("What is a trace span?")
print(response.text)

response = chat.send_message("And what is a trace ID?")
print(response.text)

Streaming

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Count to five.",
):
    print(chunk.text, end="", flush=True)

What Gets Captured

Attribute	Description
`gen_ai_request_model`	Requested model (e.g. `gemini-2.5-flash`)
`gen_ai_response_model`	Actual model used
`gen_ai_system`	Provider identifier (`Google`)
`gen_ai_usage_input_tokens`	Input tokens consumed
`gen_ai_usage_output_tokens`	Output tokens generated
`llm_usage_tokens_total`	Total tokens (includes thinking tokens for Gemini 2.5 models)
`llm_usage_cost_input`	Estimated input cost in USD
`llm_usage_cost_output`	Estimated output cost in USD
`llm_usage_cost_total`	Estimated total cost in USD
`duration`	End-to-end request latency
`error`	Exception details if the request failed

Viewing Traces

Log in to OpenObserve and navigate to Traces in the left sidebar
Click any span to inspect token counts, latency, and full request metadata
For chat sessions, each send_message call appears as a separate span

Gemini trace span attributes in OpenObserve

Next Steps

With Gemini instrumented, every model call in your application is automatically recorded in OpenObserve. From here you can track token usage including thinking tokens for Gemini 2.5 models, compare latency across model versions, and correlate Gemini spans with the rest of your application traces.

Read More