What OpenTelemetry attributes should I capture for OpenAI cost tracking?

Capture the standard GenAI semantic conventions: genai.provider.name (provider name), genai.request.model, genai.response.model, genai.usage.inputtokens, genai.usage.outputtokens, and genai.response.finishreasons. Add your own custom attributes on every span for attribution: feature (which product path triggered the call), userid (hashed if needed), team, environment, and genai.usage.costusd once you compute it. These attributes become filterable dimensions in any OTLP-compatible backend.

How accurate is OTel-based cost tracking compared to the OpenAI billing dashboard?

Token-based cost estimation in OpenTelemetry is typically within 1 to 3 percent of the actual bill for standard text chat completions. Drift comes from three sources: cached input tokens on repeat prompts, reasoning tokens emitted by o-series models, and batch API discounts. For an accurate monthly view, reconcile your OTel-derived cost metric against the OpenAI usage endpoint once a billing cycle. Treat OTel as your real-time signal and the billing API as the ground truth.

How do I attribute LLM cost to individual users without logging their prompts?

Set userid as a span attribute on every request and leave OTELINSTRUMENTATIONGENAICAPTUREMESSAGECONTENT unset. This records who made the call and how many tokens it consumed, but not the prompt content or the completion text. Hash or pseudonymize the userid if required by your compliance posture. You still get per-user cost rollups in your dashboards without introducing PII into your observability backend.

When should I not instrument LLM calls with OpenTelemetry?

There are almost no cases where you should skip it. Two narrow exceptions: first, if you already route every LLM call through a proxy like LiteLLM that logs directly to your observability backend, the proxy-side spans may be enough. Second, if you are running in a latency-critical path where BatchSpanProcessor export overhead matters, consider sampling or a SimpleSpanProcessor with careful tuning. In practice, the overhead is negligible for any call that takes more than 100 milliseconds, which is every LLM call.

What OpenTelemetry attributes should I capture for OpenAI cost tracking?

Capture the standard GenAI semantic conventions: genai.provider.name (provider name), genai.request.model, genai.response.model, genai.usage.inputtokens, genai.usage.outputtokens, and genai.response.finishreasons. Add your own custom attributes on every span for attribution: feature (which product path triggered the call), userid (hashed if needed), team, environment, and genai.usage.costusd once you compute it. These attributes become filterable dimensions in any OTLP-compatible backend.

How accurate is OTel-based cost tracking compared to the OpenAI billing dashboard?

Token-based cost estimation in OpenTelemetry is typically within 1 to 3 percent of the actual bill for standard text chat completions. Drift comes from three sources: cached input tokens on repeat prompts, reasoning tokens emitted by o-series models, and batch API discounts. For an accurate monthly view, reconcile your OTel-derived cost metric against the OpenAI usage endpoint once a billing cycle. Treat OTel as your real-time signal and the billing API as the ground truth.

How do I attribute LLM cost to individual users without logging their prompts?

Set userid as a span attribute on every request and leave OTELINSTRUMENTATIONGENAICAPTUREMESSAGECONTENT unset. This records who made the call and how many tokens it consumed, but not the prompt content or the completion text. Hash or pseudonymize the userid if required by your compliance posture. You still get per-user cost rollups in your dashboards without introducing PII into your observability backend.

When should I not instrument LLM calls with OpenTelemetry?

There are almost no cases where you should skip it. Two narrow exceptions: first, if you already route every LLM call through a proxy like LiteLLM that logs directly to your observability backend, the proxy-side spans may be enough. Second, if you are running in a latency-critical path where BatchSpanProcessor export overhead matters, consider sampling or a SimpleSpanProcessor with careful tuning. In practice, the overhead is negligible for any call that takes more than 100 milliseconds, which is every LLM call.

OpenTelemetry LLM OpenAI Cost Monitoring AI

How to Monitor OpenAI API Costs and Token Usage with OpenTelemetry

Gorakhnath Yadav

April 21, 2026

14 min read

Don’t forget to share!

Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Table of Contents

Monitoring OpenAI API cost and token usage with OpenTelemetry and OpenObserve

How to Monitor OpenAI API Costs and Token Usage with OpenTelemetry

TL;DR

Capture gen_ai.* semantic convention attributes on every OpenAI call: request model, input tokens, output tokens. Add feature, user_id, and team on every span so you can break down cost by who and what is spending.
Compute gen_ai.usage.cost_usd from a pricing table you control and emit it as both a span attribute (for per-request drill-down) and a histogram metric (for aggregation and alerting).
Alert on cost anomalies relative to your historical baseline, not just static budget thresholds. Retry loops and runaway agents show up as deviations before they ever cross a daily spend limit.

Monitoring OpenAI API cost and token usage with OpenTelemetry and OpenObserve

Why OpenAI bills are impossible to predict without instrumentation

Running an LLM app in production without instrumentation is a slow way to find out your margins are negative. Token consumption is non-obvious: a single user with a verbose system prompt and long chat history can cost 20x more per interaction than an average user. A bug in a retry loop can 10x your daily spend in an hour. A single new feature that adds RAG context to every call can double your input token count overnight.

The OpenAI dashboard tells you what you spent yesterday. It does not tell you which feature, which user, which prompt template, or which model variant drove the spend. By the time you notice a cost spike in your billing dashboard, you have already paid for it.

The fix is the same fix you use for any production system: emit structured telemetry at the point of the API call and make it queryable. OpenTelemetry gives you a vendor-neutral way to do this, and a growing set of GenAI-specific conventions means the fields you emit today will still be meaningful in two years. For a broader view of what good LLM observability looks like, see our guide on LLM monitoring best practices.

Quick start: Jump to the Python setup or Node.js setup if you just need the code.

The three signals you actually need to track

For LLM cost monitoring, three signals carry almost all the value:

Token usage tells you how much capacity you consumed. Input tokens and output tokens, always separately, because they price differently.
Cost is the dollar-denominated derivative of token usage. You compute it at emit time using a pricing table you control.
Latency tells you how long users waited. For streaming endpoints, split this into time to first token and total duration.

Everything else (error rate, finish reason, response model) is useful context for these three. Start with the three and add context as you need it.

What OpenTelemetry's GenAI semantic conventions give you

OpenTelemetry has a dedicated set of semantic conventions for generative AI workloads, living under the gen_ai.* namespace. The point of conventions is that the same attribute names work across providers and observability backends, so your queries do not break when you swap from OpenAI to Anthropic or from one backend to another.

The attributes you will use most:

Attribute	What it holds
`gen_ai.provider.name`	Provider name: `openai`
`gen_ai.request.model`	Model requested by your code: `gpt-4o`, `gpt-4o-mini`
`gen_ai.response.model`	Model the provider actually used (can differ if provider routes)
`gen_ai.operation.name`	`chat`, `text_completion`, `embeddings`
`gen_ai.usage.input_tokens`	Prompt tokens consumed
`gen_ai.usage.output_tokens`	Completion tokens generated
`gen_ai.request.temperature`	Temperature parameter (useful when debugging determinism)
`gen_ai.request.max_tokens`	Max tokens parameter
`gen_ai.response.finish_reasons`	Why the model stopped: `stop`, `length`, `content_filter`

One attribute worth noting: gen_ai.system has been renamed to gen_ai.provider.name in the current OTel GenAI spec. Most instrumentation libraries still emit gen_ai.system today. Your backend should accept both until library adoption catches up.

OpenTelemetry GenAI semantic convention attributes attached at each stage of an LLM request

Instrumenting a Python app with the official OTel OpenAI SDK

This guide uses opentelemetry-instrumentation-openai-v2, the official OTel package maintained in opentelemetry-python-contrib. It follows the GenAI semantic conventions closely and is the right choice for OpenAI instrumentation.

Install the three packages

pip install opentelemetry-distro
pip install opentelemetry-exporter-otlp
pip install opentelemetry-instrumentation-openai-v2

Then run the bootstrap command once to install auto-instrumentation for any other libraries in your app (Flask, FastAPI, requests, and so on):

opentelemetry-bootstrap --action=install

Set the OTLP endpoint for OpenObserve

Grab your OTLP HTTP endpoint and Authorization header from the OpenObserve UI under Data Sources -> Traces (OpenTelemetry) -> OTLP HTTP. Set these environment variables:

export OTEL_SERVICE_NAME=my-llm-app
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.openobserve.ai/api/<your-org>"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <your-auth-token>"
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true

If you are self-hosting OpenObserve, the endpoint is typically http://localhost:5080/api/<your-org>.

Run with `opentelemetry-instrument`

Wrap your existing run command:

opentelemetry-instrument python app.py

No code changes to app.py. The OpenAI SDK is wrapped at import time, and every chat.completions.create call emits a span with the gen_ai.* attributes populated.

A minimal example app

# app.py
import os
from openai import OpenAI

client = OpenAI()

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize observability in one sentence."}],
)

print(resp.choices[0].message.content)
print("Input tokens:", resp.usage.prompt_tokens)
print("Output tokens:", resp.usage.completion_tokens)

Run it with opentelemetry-instrument python app.py and check the Traces tab in OpenObserve. You should see a span named chat gpt-4o-mini with the token counts attached.

Capturing message content (and the privacy tradeoff)

The instrumentation does not capture the prompt or completion text by default. To enable it:

export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

This ships the full prompt and completion as log events. It is useful for debugging but has real privacy implications: you are now logging whatever your users typed, including anything they pasted in. If your app handles regulated data (health, finance, anything under GDPR or HIPAA), do not enable this globally. Enable it per-environment or per-feature flag, and scrub sensitive fields before the exporter sees them.

OpenObserve Traces view showing LLM spans with token usage and cost attributes

Instrumenting a Node.js app

For Node.js, the pattern is the same. Install the packages:

npm install @opentelemetry/api \
  @opentelemetry/sdk-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/instrumentation-openai

Create a tracing.js bootstrap file:

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { OpenAIInstrumentation } = require('@opentelemetry/instrumentation-openai');
const { Resource } = require('@opentelemetry/resources');

const sdk = new NodeSDK({
  resource: new Resource({
    'service.name': 'my-llm-app-node',
    'deployment.environment': process.env.NODE_ENV || 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: `${process.env.OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces`,
    headers: {
      Authorization: process.env.OTEL_EXPORTER_OTLP_HEADERS,
    },
  }),
  instrumentations: [new OpenAIInstrumentation()],
});

sdk.start();

Then preload it when you run your app:

node --require ./tracing.js app.js

Same result: every OpenAI call produces a span in OpenObserve with the GenAI attributes populated.

Building a cost calculation layer

OpenAI's SDK gives you token counts. It does not give you dollars. You have to multiply tokens by a price, and that price changes. Build this as a small, updatable module.

Pricing table as code

Keep this in source control. Review it every quarter, or every time a provider announces a price change.

# pricing.py
# Prices in USD per 1 million tokens, as of April 2026.
# Verify against provider pricing pages before each release.

MODEL_PRICING = {
    "gpt-4o":      {"input": 2.50,  "output": 10.00},
    "gpt-4o-mini": {"input": 0.15,  "output": 0.60},
    "o1":          {"input": 15.00, "output": 60.00},
    "o1-mini":     {"input": 3.00,  "output": 12.00},
}


def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
    """Return the estimated cost in USD for a single LLM call."""
    pricing = MODEL_PRICING.get(model)
    if not pricing:
        # Unknown model. Emit 0 and alert separately so you can add pricing.
        return 0.0
    input_cost = (input_tokens / 1_000_000) * pricing["input"]
    output_cost = (output_tokens / 1_000_000) * pricing["output"]
    return round(input_cost + output_cost, 6)

Emitting cost as a custom metric

The official -v2 package does not emit cost, only tokens. Add cost yourself with a thin wrapper that runs after each call:

# tracked_llm.py
import time
from opentelemetry import trace, metrics
from openai import OpenAI
from pricing import calculate_cost

tracer = trace.get_tracer("llm-cost")
meter = metrics.get_meter("llm-cost")

cost_histogram = meter.create_histogram(
    name="gen_ai.usage.cost_usd",
    description="Estimated cost of a single LLM call in USD",
    unit="USD",
)

client = OpenAI()


def tracked_chat(messages, model="gpt-4o-mini", feature="unknown", user_id="anon"):
    with tracer.start_as_current_span("gen_ai.chat") as span:
        span.set_attribute("gen_ai.provider.name", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("feature", feature)
        span.set_attribute("user_id", user_id)

        start = time.perf_counter()
        response = client.chat.completions.create(model=model, messages=messages)
        elapsed_ms = (time.perf_counter() - start) * 1000

        input_tokens = response.usage.prompt_tokens
        output_tokens = response.usage.completion_tokens
        cost = calculate_cost(model, input_tokens, output_tokens)

        # Span attributes for per-request investigation
        span.set_attribute("gen_ai.usage.input_tokens", input_tokens)
        span.set_attribute("gen_ai.usage.output_tokens", output_tokens)
        span.set_attribute("gen_ai.usage.cost_usd", cost)
        span.set_attribute("gen_ai.latency.duration_ms", elapsed_ms)
        span.set_attribute("gen_ai.response.model", response.model)

        # Metric for aggregation
        cost_histogram.record(cost, {
            "gen_ai.provider.name": "openai",
            "gen_ai.request.model": model,
            "feature": feature,
            "user_id": user_id,
        })

        return response

You now have cost on the span (for drill-down) and cost as a metric (for aggregation, alerting, and dashboards). Both are labeled with feature so you can break them down later.

Attributing cost to users, features, and teams

This is the section most readers came for. Raw token counts do not answer "who is spending our money." Attribution does.

Adding attributes on every span

Every LLM call should carry four attribution dimensions:

feature: which product path triggered the call (document_summary, chat_reply, rag_answer)
user_id: hashed user identifier for per-user rollups
team: which internal team or product area owns the feature
environment: prod, staging, dev

Wire them through as keyword arguments on your wrapper:

result = tracked_chat(
    messages=[{"role": "user", "content": prompt}],
    model="gpt-4o",
    feature="document_summary",
    user_id=hashed_user_id,
)

Building the cost attribution dashboard

A complete LLM cost dashboard covers two concerns: spend attribution and token efficiency. Organize it across two tabs.

Tab 1: LLM Cost Overview

Four single-stat tiles at the top give you the headline numbers at a glance: Total LLM Cost ($), Total Input Tokens, Total Output Tokens, and Total LLM Calls. These are the first things you check when something looks off.

Below the tiles:

LLM Cost Over Time ($): bar chart over the selected time range. Reveals bursty spend patterns and days that are trending above baseline.
Cost by Model: pie chart, one slice per gen_ai.request.model. Shows your model mix and whether a cheaper model is handling the bulk of traffic.
Input vs Output Cost Over Time ($): grouped bar chart with two series, input_cost and output_cost. Output tokens cost 3-4x more than input tokens on most models; this panel tells you which side is driving cost growth.
Token Usage by Model: grouped bar chart of input_tokens and output_tokens per model. Cross-reference this with Cost by Model to spot models that are expensive relative to their token volume.
Token Usage Over Time: time series of token counts. Useful for capacity planning and catching prompt inflation.

Tab 2: Tool Monitoring

If your application uses function calling or tool use, track it separately. Tool calls are often the highest-cost path because they trigger multi-turn completions. Panels here should cover tool call volume by tool name, cost per tool invocation, and error rate.

LLM Cost Monitoring dashboard in OpenObserve showing total cost, token usage, cost by model, and input vs output cost over time

Alerting on cost anomalies and rate-limit errors

Static budget thresholds are table stakes. The interesting failures are the ones that do not cross a static threshold until it is too late.

Threshold alerts vs anomaly detection

A threshold alert fires when daily spend exceeds $500. It works for the blunt cases. It misses three common failure modes:

A retry loop that 3x's a specific feature's token usage in an hour. The daily threshold may still be fine by end of day, but you paid 3x for that hour.
A prompt injection that triggers a long runaway completion on a single request, burning 100k output tokens in one call.
Seasonal growth that quietly pushes baseline from $300/day to $600/day over a month, outpacing capacity plans.

Anomaly detection catches all three by comparing current behavior to historical baseline rather than to a fixed number. For a deep dive on how this works in practice, see our AI anomaly detection guide, which walks through a runaway LLM cost scenario specifically.

A daily budget threshold

Set this first. In OpenObserve, create an alert on the gen_ai.usage.cost_usd metric:

Trigger: SUM(gen_ai_usage_cost_usd) over 24h is greater than 500
Evaluation frequency: every 5 minutes
Action: Slack or PagerDuty, routed to the LLM-platform team

An anomaly-based alert for cost spikes

This is more valuable. Create an anomaly alert on gen_ai.usage.cost_usd grouped by feature, with a training window of the last 14 days and a sensitivity tuned to catch 3x deviations. A retry loop in the document_summary feature shows up in minutes, before it hits your daily threshold.

Alert on rate-limit errors (HTTP 429)

When OpenAI rate-limits you, downstream calls fail and retries pile up. Fire an alert when gen_ai.response.error.type = rate_limit_exceeded exceeds a low threshold (say, 5 in 5 minutes). This usually surfaces a runaway loop before a cost anomaly does.

Alerts are what turn passive observability into active cost control. Without them, you are checking dashboards manually, which means you find out about a runaway loop after it has already doubled your bill. A well-configured alert on gen_ai.usage.cost_usd catches the spike within minutes, before it becomes a billing surprise. For a step-by-step walkthrough of how to configure alerts in OpenObserve, including conditions, evaluation windows, and notification destinations, see Alerting 101: From Concept to Demo.

Reconciling estimated cost with the OpenAI billing API

Your OTel-derived cost is an estimate. It is usually within a couple of percent, but it drifts from the real bill for three reasons:

Cached input tokens. Repeat prompts are billed at a discount. Your naive pricing math assumes full price.
Reasoning tokens. o1 and similar models emit internal reasoning tokens that count toward billing but may not appear in the standard usage object.
Batch API discounts. If you use the async batch endpoint, those requests are priced lower.

Reconcile monthly. Pull the OpenAI usage endpoint and compare total cost for the window against your OTel sum. If the drift is more than 5 percent, dig in and adjust your pricing table. This is the pattern production teams use: OTel for real-time signal, billing API for ground truth.

If you are new to the OpenTelemetry Collector itself and want the broader context on how data flows from your app to a backend, the walkthrough in our distributed tracing guide covers the fundamentals.

Measuring time to first token for streaming

For chat UIs, users feel time to first token (TTFT), not total duration. If you use streaming responses, capture it:

import time

def stream_with_ttft(messages, model="gpt-4o"):
    with tracer.start_as_current_span("gen_ai.chat") as span:
        span.set_attribute("gen_ai.provider.name", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.response.streaming", True)

        start = time.perf_counter()
        ttft_ms = None

        stream = client.chat.completions.create(
            model=model,
            messages=messages,
            stream=True,
        )

        chunks = []
        for chunk in stream:
            if ttft_ms is None and chunk.choices[0].delta.content:
                ttft_ms = (time.perf_counter() - start) * 1000
                span.set_attribute("gen_ai.latency.ttft_ms", ttft_ms)
            chunks.append(chunk)

        total_ms = (time.perf_counter() - start) * 1000
        span.set_attribute("gen_ai.latency.duration_ms", total_ms)
        return chunks

Now you can alert on TTFT regressions separately from total-duration regressions.

Production checklist

Before shipping this to prod:

Retention policy set on your LLM telemetry stream. Full prompt/completion content (if captured) has different retention needs than raw token counts.
PII scrubbing pipeline in place if OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true. Do not rely on "we probably don't have PII."
Sampling strategy decided. Head-based sampling at 100% for LLM spans is usually fine (they are low-volume and high-value). Do not aggressively sample LLM spans the way you might sample DB queries.
Pricing table in source control with a quarterly review reminder.
Budget threshold alert and at least one anomaly-based alert configured.
Monthly reconciliation against the OpenAI billing API scheduled.

Send your LLM telemetry to OpenObserve

OpenObserve is an open-source observability platform that accepts standard OTLP over HTTP and gRPC. There is no proprietary SDK to adopt and no special instrumentation to learn. Point your OTLP exporter at OpenObserve Cloud or a self-hosted instance, and your LLM spans, logs, and metrics land in the same place as your infrastructure telemetry, ready to query, dashboard, and alert on.

If you want to see this working end to end without setting up a server, spin up a free account at OpenObserve Cloud or read the LLM Observability overview for the product view.

Frequently Asked Questions

: Capture the standard GenAI semantic conventions: gen_ai.provider.name (provider name), gen_ai.request.model, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.finish_reasons. Add your own custom attributes on every span for attribution: feature (which product path triggered the call), user_id (hashed if needed), team, environment, and gen_ai.usage.cost_usd once you compute it. These attributes become filterable dimensions in any OTLP-compatible backend.
: Token-based cost estimation in OpenTelemetry is typically within 1 to 3 percent of the actual bill for standard text chat completions. Drift comes from three sources: cached input tokens on repeat prompts, reasoning tokens emitted by o-series models, and batch API discounts. For an accurate monthly view, reconcile your OTel-derived cost metric against the OpenAI usage endpoint once a billing cycle. Treat OTel as your real-time signal and the billing API as the ground truth.
: Set user_id as a span attribute on every request and leave OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT unset. This records who made the call and how many tokens it consumed, but not the prompt content or the completion text. Hash or pseudonymize the user_id if required by your compliance posture. You still get per-user cost rollups in your dashboards without introducing PII into your observability backend.
: There are almost no cases where you should skip it. Two narrow exceptions: first, if you already route every LLM call through a proxy like LiteLLM that logs directly to your observability backend, the proxy-side spans may be enough. Second, if you are running in a latency-critical path where BatchSpanProcessor export overhead matters, consider sampling or a SimpleSpanProcessor with careful tuning. In practice, the overhead is negligible for any call that takes more than 100 milliseconds, which is every LLM call.

About the Author

Gorakhnath Yadav

Gorakhnath is a passionate developer advocate, working on bridging the gap between developers and the tools they use. He focuses on building communities and creating content that empowers developers to build better software.

Latest From Our Blogs

View all posts

How To

MigrationHeliconeOpenObserve

How to Migrate from Helicone to OpenObserve

Helicone entered maintenance mode after Mintlify's March 2026 acquisition, with new signups closed and the roadmap frozen. Here's how to move LLM observability off Helicone's proxy and onto OpenObserve: replace the base-URL proxy with OpenTelemetry instrumentation, map Properties, Users, and Sessions to gen_ai attributes, and get infra correlation in the same backend.

Simran Kumari

2026-07-13

Pin a Dashboard to Your OpenObserve Home Page (Org-Wide)

How To

DashboardsObservabilityOpenObserve

Pin a Dashboard to Your OpenObserve Home Page (Org-Wide)

You asked, we shipped: make one dashboard the org-wide landing view in OpenObserve. Pin it from the dashboard list or the dashboard header, and everyone on the team sees the same Home tab, server-side and across devices.

Ashish Kolhe

2026-07-13

Tracing a Runaway LLM Token Spike From Session to Trace to RUM

Engineering

LLM ObservabilityOpenTelemetryDistributed Tracing

Tracing a Runaway LLM Token Spike From Session to Trace to RUM

How an AI-governance engineer walks one anomalous LLM turn across three signals in OpenObserve — session, distributed trace, and RUM replay — to pin down cost, cause, and the human action behind a token spike.

Ashish Kolhe

2026-07-13

Instrumenting the OpenAI Agents SDK with OpenTelemetry

How To

OpenAI Agents SDKOpenTelemetryObservability

Instrumenting the OpenAI Agents SDK with OpenTelemetry

Trace the OpenAI Agents SDK with OpenTelemetry: map handoffs, guardrails, and agent spans to OTLP and send the full trace to OpenObserve, not OpenAI's backend.

Gorakhnath Yadav

2026-07-10

Observability Cost Optimization: 12 Tactics That Actually Work

Engineering

ObservabilityCostLogging

Observability Cost Optimization: 12 Tactics That Actually Work

Twelve config-level tactics for observability cost optimization, sampling, pipeline filtering, retention tiers, and cardinality control, with before/after numbers and real config examples for logs, metrics, and traces.

Simran Kumari

2026-07-10

OpenObserve vs Langfuse: Unified Observability vs LLM-Specific Platform (2026)

Engineering

ComparisonsLangfuseOpenObserve

OpenObserve vs Langfuse: Unified Observability vs LLM-Specific Platform (2026)

OpenObserve vs Langfuse in 2026: unified infra+LLM observability vs a dedicated LLM platform. Feature matrix, pricing, and when to use each (or both).

Gorakhnath Yadav

2026-07-10

Engineering

LoggingComparisonsObservability

Best Log Visualization Tools in 2026

Compare the best log visualization tools in 2026: OpenObserve, Kibana, Grafana Loki, Datadog, and Splunk. Covers AI-assisted analysis, dashboard quality, and cost.

Manas Sharma

2026-07-07

Top 10 Datadog Competitors in 2026: In-Depth Comparison for DevOps & SRE Teams

Engineering

ComparisonsObservabilityMonitoring

Top 10 Datadog Competitors in 2026: In-Depth Comparison for DevOps & SRE Teams

Compare the top 10 Datadog competitors in 2026: OpenObserve, Grafana, New Relic, Dynatrace, and Splunk. Pricing breakdowns, feature tables, and migration guidance for DevOps and SRE teams.

Simran Kumari

2026-07-07

Best Distributed Tracing Tools in 2026: A Complete Guide

Engineering

TracingObservabilityMicroservices

Best Distributed Tracing Tools in 2026: A Complete Guide

A practical guide to the best distributed tracing tools in 2026: OpenObserve, Jaeger, Grafana Tempo, Zipkin, and Honeycomb. Covers OTel compatibility, high-cardinality support, and deployment trade-offs.

Simran Kumari

2026-07-07

Top 10 Elasticsearch Alternatives in 2026: Complete Comparison Guide

Engineering

ComparisonsLoggingObservability

Top 10 Elasticsearch Alternatives in 2026: Complete Comparison Guide

Discover the best Elasticsearch alternatives in 2026. Compare OpenObserve, OpenSearch, ClickHouse, Grafana Loki, and Solr on cost, search performance, and deployment options.

Simran Kumari

2026-07-07

How to Monitor OpenAI API Costs and Token Usage with OpenTelemetry

Ready to get started?

How to Monitor OpenAI API Costs and Token Usage with OpenTelemetry

TL;DR

Why OpenAI bills are impossible to predict without instrumentation

The three signals you actually need to track

What OpenTelemetry's GenAI semantic conventions give you

Instrumenting a Python app with the official OTel OpenAI SDK

Install the three packages

Set the OTLP endpoint for OpenObserve

Run with opentelemetry-instrument

A minimal example app

Capturing message content (and the privacy tradeoff)

Instrumenting a Node.js app

Building a cost calculation layer

Pricing table as code

Emitting cost as a custom metric

Attributing cost to users, features, and teams

Adding attributes on every span

Building the cost attribution dashboard

Alerting on cost anomalies and rate-limit errors

Threshold alerts vs anomaly detection

A daily budget threshold

An anomaly-based alert for cost spikes

Alert on rate-limit errors (HTTP 429)

Reconciling estimated cost with the OpenAI billing API

Measuring time to first token for streaming

Production checklist

Send your LLM telemetry to OpenObserve

Further reading

Frequently Asked Questions

What OpenTelemetry attributes should I capture for OpenAI cost tracking?

How accurate is OTel-based cost tracking compared to the OpenAI billing dashboard?

How do I attribute LLM cost to individual users without logging their prompts?

When should I not instrument LLM calls with OpenTelemetry?

About the Author

Gorakhnath Yadav

Latest From Our Blogs

How to Migrate from Helicone to OpenObserve

Pin a Dashboard to Your OpenObserve Home Page (Org-Wide)

Tracing a Runaway LLM Token Spike From Session to Trace to RUM

Instrumenting the OpenAI Agents SDK with OpenTelemetry

Observability Cost Optimization: 12 Tactics That Actually Work

OpenObserve vs Langfuse: Unified Observability vs LLM-Specific Platform (2026)

Best Log Visualization Tools in 2026

Top 10 Datadog Competitors in 2026: In-Depth Comparison for DevOps & SRE Teams

Best Distributed Tracing Tools in 2026: A Complete Guide

Top 10 Elasticsearch Alternatives in 2026: Complete Comparison Guide

Run with `opentelemetry-instrument`