Fireworks AI → OpenObserve
Automatically capture token usage, latency, and model metadata for every Fireworks AI inference call in your Python application. Fireworks AI exposes an OpenAI-compatible API, so instrumentation uses the standard OpenAI instrumentor pointed at the Fireworks endpoint.
Prerequisites
- Python 3.9+
- An OpenObserve account (cloud or self-hosted)
- Your OpenObserve organisation ID and Base64-encoded auth token
- A Fireworks AI API key
Installation
Configuration
Create a .env file in your project root:
# OpenObserve instance URL
# Default for self-hosted: http://localhost:5080
OPENOBSERVE_URL=https://api.openobserve.ai/
# Your OpenObserve organisation slug or ID
OPENOBSERVE_ORG=your_org_id
# Basic auth token — Base64-encoded "email:password"
OPENOBSERVE_AUTH_TOKEN=Basic <your_base64_token>
# Fireworks AI API key
FIREWORKS_API_KEY=your-fireworks-api-key
Instrumentation
Call OpenAIInstrumentor().instrument() before creating the OpenAI client. Point the client at the Fireworks AI base URL and pass your Fireworks API key.
from dotenv import load_dotenv
load_dotenv()
from openinference.instrumentation.openai import OpenAIInstrumentor
from openobserve import openobserve_init
OpenAIInstrumentor().instrument()
openobserve_init()
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["FIREWORKS_API_KEY"],
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
messages=[{"role": "user", "content": "Explain distributed tracing in one sentence."}],
)
print(response.choices[0].message.content)
What Gets Captured
| Attribute | Description |
|---|---|
llm_model_name |
Model name (e.g. accounts/fireworks/models/llama-v3p3-70b-instruct) |
llm_provider |
fireworks |
llm_system |
openai (the client library used) |
llm_token_count_prompt |
Tokens in the prompt |
llm_token_count_completion |
Tokens in the response |
llm_token_count_total |
Total tokens consumed |
llm_token_count_prompt_details_cache_read |
Prompt tokens served from cache |
openinference_span_kind |
LLM |
operation_name |
ChatCompletion |
duration |
End-to-end request latency |
span_status |
OK or error status |
Viewing Traces
- Log in to OpenObserve and navigate to Traces
- Click any span to inspect token counts and request metadata
- Filter by
llm_model_nameto compare latency across different Fireworks models

Next Steps
With Fireworks AI instrumented, every inference call is recorded in OpenObserve. From here you can monitor latency across open-source model variants, track token consumption, and identify slow or failing requests.