Skip to content

Get Demo Star

Groq → OpenObserve

Automatically capture token usage, latency, and model metadata for every Groq inference call in your Python application.

Prerequisites

Python 3.9+
A Groq account with an API key
An OpenObserve account (cloud or self-hosted)
Your OpenObserve organisation ID and Base64-encoded auth token

Installation

pip install openobserve-telemetry-sdk openinference-instrumentation-groq groq python-dotenv

Configuration

Create a .env file in your project root:

OPENOBSERVE_URL=https://api.openobserve.ai/
OPENOBSERVE_ORG=your_org_id
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
GROQ_API_KEY=your-groq-api-key

Instrumentation

Call GroqInstrumentor().instrument() before importing the Groq client.

from dotenv import load_dotenv
load_dotenv()

from openinference.instrumentation.groq import GroqInstrumentor
from openobserve import openobserve_init

GroqInstrumentor().instrument()
openobserve_init()

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Explain observability in one sentence."}],
)
print(response.choices[0].message.content)

What Gets Captured

Attribute	Description
`llm_model_name`	Model used (e.g. `llama-3.1-8b-instant`)
`gen_ai_response_model`	Model that served the response
`llm_usage_tokens_input`	Prompt token count
`llm_usage_tokens_output`	Completion token count
`llm_usage_tokens_total`	Total tokens consumed
`openinference_span_kind`	`LLM`
`operation_name`	`Completions`
`span_status`	`OK` or `ERROR`
`status_message`	Error details if the request failed
`duration`	Request latency

Viewing Traces

Log in to OpenObserve and navigate to Traces in the left sidebar
Filter by llm_model_name to find Groq spans
Click any span to inspect token counts, model name, and latency
Error spans show span_status = ERROR with the full error message in status_message

Groq trace in OpenObserve

Next Steps

With Groq instrumented, every inference call is recorded in OpenObserve. From here you can compare latency across Groq models, track token consumption over time, and set alerts on error rates.

Read More