Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Get Started For Free
Table of Contents
dd vs o2 Blog-1 (1).png

DataDog vs OpenObserve Part 7: Pipelines

Your log pipeline processed 2.3 billion events last month. But 40% of that volume was noise, debugged logs from a chatty microservice, duplicate events from retried requests, and verbose JSON payloads that could have been flattened. You wanted to filter them at ingestion, but Datadog's Observability Pipelines required deploying a separate Worker infrastructure, and the per-GB pricing for processed data made filtering economically questionable.

This is the hidden complexity of Datadog's pipeline model: processing power requires separate infrastructure, Grok parsing demands specialized syntax knowledge, and cost optimization becomes its own engineering project. Teams ask "can we afford to process this?" instead of "how should we transform this data?"

This hands-on comparison tests DataDog and OpenObserve for data pipelines, sending identical production-like data to both platforms simultaneously. The results show how these platforms handle log parsing, data transformation, routing, enrichment, and cost structure with the same OpenTelemetry-instrumented workload. OpenObserve transforms the fundamental question from "can we afford to process this?" to "how do we want to transform this data?" The platform provides comprehensive pipeline capabilities without infrastructure overhead or per-GB processing costs.

This is Part 7 in a series comparing DataDog and OpenObserve for observability:

TL;DR: Key Findings

  • Architecture: Datadog splits pipelines across multiple products and paid worker infrastructure, while OpenObserve provides a single, built-in pipeline for logs, metrics, and traces with no extra deployment or per-GB processing cost.
  • Processing Language: Datadog splits processing logic across Grok, UI remappers, and VRL workers, while OpenObserve uses VRL as a single, universal scripting layer for logs, metrics, and traces.
  • Execution Model: Datadog supports only real-time, ingestion-time pipelines, while OpenObserve uses the same engine for real-time streaming and scheduled batch processing on historical data.
  • Destination: Datadog requires external worker infrastructure for multi-destination routing, while OpenObserve supports native, visual fan-out routing during ingestion.

What We Tested

We configured identical pipeline scenarios covering standard data processing needs: filtering debug-level events, routing security logs to separate streams, enriching logs with GeoIP data, and redacting PII from payment service logs, all using the OpenTelemetry Astronomy Shop demo.

All services were instrumented with OpenTelemetry SDKs sending logs, metrics, and traces to the OTel Collector, which exported to both DataDog and OpenObserve simultaneously. Same data, same timestamps, same volumes. We then created equivalent pipelines in both platforms to process identical log streams and measured transformation complexity, processing latency, and operational overhead.

Pipeline Architecture: Integrated vs. Separate Worker

Processing observability data at scale requires an architecture that can transform, filter, and route events efficiently. Datadog uses separate systems for each telemetry type: Log Pipelines, Metrics Pipelines, and APM Ingestion Controls while OpenObserve processes logs, metrics, and traces through a single unified pipeline system with zero additional deployment.

Datadog offers distinct pipeline systems for each telemetry type, each with its own configuration interface, processing model, and limitations.

  • Log Pipelines (Post-Ingestion): Process logs after they reach Datadog. Limited to linear processors and requires manual "Facet" configuration for alerting on parsed fields. Source: Datadog Docs - Standard Attributes and Processors
  • Observability Pipelines (Separate Product): Based on the Vector engine, this requires deploying the Observability Pipelines Worker on your own infrastructure, a separate service you must scale, monitor, and maintain. Source: Datadog Docs - Observability Pipelines Worker.
  • Siloed Controls: Metrics and APM have separate ingestion controls (tag filtering and sampling rules) that do not share logic with log pipelines.

OpenObserve simplifies the workflow with a single pipeline system for all telemetry types. Logs, metrics, and traces flow through the same visual canvas with the same VRL functions.

OpenObserve Unified Processing

  • Unified Processing Model: OpenObserve pipelines handle logs, metrics, and traces identically. The same VRL (Vector Remap Language) functions that parse log messages can enrich trace spans or transform metric labels.
  • Automatic Scaling: Pipeline processing is built-in and scales with your OpenObserve nodes. There are no separate worker replicas to manage and no per-telemetry-type configuration sprawl.
  • VRL Power: Unlike Datadog's UI-based log processors, VRL allows for complex logic like if/else, loops, and dynamic enrichment table lookups in a single script.

Processing Language: Multi-Silo DSLs vs. Unified VRL

Datadog's processing power is distributed across layers that don't always communicate.

  • Log Pipelines (Grok/UI): Most SaaS-side processing is restricted to a linear chain of processors. If you want to perform a "conditional lookup," you are often forced to create multiple parallel pipelines with different filters, which is difficult to audit. Datadog Log Parser Datadog Grok Syntax

  • The VRL Silo: While Datadog owns the Vector project (which created VRL), VRL is primarily used in the Observability Pipelines Worker. This means if you write a sophisticated VRL script to scrub PII at the edge, you cannot simply copy-paste that logic into a Datadog "Standard Pipeline" in the SaaS UI; you are stuck with Grok and UI remappers there.

  • Metric/APM Disconnect: Metrics and Traces are largely "black boxes" in terms of transformation. You can sample them or filter them via tags, but you cannot easily "rewrite" a metric name or calculate a new field from a span attribute using a script during ingestion.

OpenObserve treats VRL as the universal "CPU" for its data processing layer.

  • Universal Scripting: Whether data enters via FluentBit, OTel, or Syslog, it passes through the same VRL engine. You can write complex, multi-line logic—including if/else statements, loops, and custom error handling—in one place.
  • Enrichment Tables: OpenObserve supports Native Enrichment Tables. You can upload a CSV (e.g., user_id to email) and perform a high-speed lookup directly inside your VRL script with a single function call. In Datadog, this requires the "Lookup Processor" UI widget, which is less flexible for dynamic logic.
  • Single Learning Curve: A developer writes a script for logs and can immediately apply the same logic to traces or metrics. This eliminates the "context switching" between different proprietary syntaxes.

Execution Model: Real-Time vs. Scheduled

Processing observability data requires a balance between immediate action and long-term analysis. Datadog focuses on real-time stream processing, while OpenObserve provides a unified engine for both real-time streams and scheduled batch jobs.

Datadog: Streaming Only

Datadog is architected for immediate ingestion. Its pipeline logic triggers only at the moment data hits the platform.

  • Real-Time Focus: Ideal for instant tasks like redacting PII or remapping attributes.
  • No Native Scheduled Pipelines: There is no built-in way to "re-process" historical data or run batch jobs on a schedule. To backfill data or apply new logic to old logs, you must use external scripts or ETL tools.
  • Limited Pre-Aggregation: Summarizing data (e.g., turning logs into metrics) requires separate "Distribution Metrics" or "Metric Summary" configurations, which live outside the pipeline UI.

OpenObserve: Unified Stream & Batch

OpenObserve treats "Real-Time" and "Scheduled" as two modes of the same system, determined by the "Source" node on your canvas.

  • Real-Time Pipelines: Process data via VRL as it arrives for sub-second routing and parsing.
  • Scheduled Pipelines (Query Source): Run SQL or PromQL queries at fixed intervals (e.g., every 5 minutes or via Cron).
    • Summarization: Query millions of logs to calculate a "Daily Active User" count and write the result to a Metrics Stream.
    • Reprocessing: Run a one-time job to look at historical data, apply a new VRL transformation, and save the corrected data to a new stream.
  • Operational Flexibility: Native support for Frequency, Period, and Delay settings ensures you can account for late-arriving data in your batch jobs.

Pipeline Destination: Single-Hose vs. Multi-Sink Routing

Modern observability often requires "dual shipping" sending data to a real-time engine for troubleshooting while simultaneously archiving it in low-cost storage for compliance.

Datadog: The Infrastructure Hurdle

In Datadog, routing data to multiple destinations is not a native feature of the primary SaaS platform. It requires a significant architectural and cost addition.

  • Separate Worker Required: You must deploy and maintain the Observability Pipelines Worker (OPW) on your own infrastructure (K8s, EC2).
  • Manual Configuration: Sinks are managed via YAML files. To "dual ship" to Datadog and a third-party SIEM or S3 bucket, you must manually define and manage these connections in code.

OpenObserve: Visual Multi-Destination

OpenObserve treats routing as a core, built-in feature of its visual pipeline canvas, removing the need for external components.

  • Visual Sink Nodes: Simply drag and drop multiple Sink nodes (S3, MinIO, GCS, or remote O2 clusters) onto the canvas.
  • Bifurcation logic: You can visually "split" a stream. For example, route Critical errors to a high-retention stream while sending Debug logs directly to an S3 archive.
  • Resilient In-Process Routing: Routing happens natively during ingestion. Built-in Persistent Queues ensure that if a destination (like a remote S3 bucket) is slow, data is buffered and retried automatically without data loss.

Streaming to Multiple Destination

Quick Comparison: DataDog vs. OpenObserve Pipelines

Area Datadog OpenObserve
Architecture Split across multiple products and workers Single built-in pipeline system
Processing Language Grok/UI + VRL only in external workers VRL everywhere (logs, metrics, traces)
Execution Model Real-time, ingestion-only Real-time + scheduled batch pipelines
Historical Reprocessing Not supported natively Native support
Multi-Destination Routing Requires external worker infrastructure Native visual fan-out routing
Operational Overhead Extra infra, configs, higher cost No extra deployment, unified UI

The Bottom Line

Datadog offers powerful pipeline capabilities, but they are distributed across multiple products, rely on separate worker infrastructure for advanced use cases, and introduce cost and operational friction when teams want to filter, enrich, route, or dual-ship data at scale. If you are already invested in Datadog and comfortable running additional workers—and pipeline processing cost is not a concern—the model works.

But if you’re evaluating observability platforms or open-source Datadog alternatives for data pipelines, OpenObserve delivers a fundamentally simpler and more flexible approach:

  • One unified pipeline for logs, metrics, and traces: a single processing engine instead of siloed log, metric, and APM controls
  • Universal VRL scripting: complex transformations, enrichment, and conditional logic in one language, reusable across all telemetry
  • Real-time and scheduled pipelines: stream processing and batch reprocessing using the same system
  • Native multi-destination routing: fan-out to multiple streams or storage backends without external workers
  • Visual pipeline canvas: no YAML-heavy worker configs or hidden execution order

For platform engineers managing OpenTelemetry-instrumented microservices, these differences are decisive. No hesitation before filtering noisy logs. No duplication of logic across pipelines. No separate infrastructure just to route data or apply conditional enrichment. One mental model, one UI, one processing engine.



Sign up for a free cloud trial or schedule a demo to test OpenObserve's IAM features with your team.

About the Author

Simran Kumari

Simran Kumari

LinkedIn

Passionate about observability, AI systems, and cloud-native tools. All in on DevOps and improving the developer experience.

Latest From Our Blogs

View all posts