Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Get Started For Free
Table of Contents
dd vs o2 Blog-1.png

DataDog vs OpenObserve: Part 5 - Alerts, Monitors, and Destinations

Your incident response channel lit up at 3 AM. Checkout service is down. Error rates spiking. But your DataDog alert didn't fire because you disabled it last month - it was triggering on a custom metric, and DataDog charges $5 per 100 custom metrics per month. Multiply that across different retention tiers, and suddenly you're choosing between comprehensive alerting and budget predictability.

This is the hidden cost of DataDog's alerting model: custom metric pricing transforms operational decisions into financial calculations. Engineers ask "can we afford to alert on this?" instead of "should we monitor this?" Teams disable alerts to control costs. Incidents go undetected.

This hands-on comparison tests DataDog and OpenObserve for alerting and monitoring, sending identical production-like data to both platforms simultaneously. The results show how these platforms handle alert creation, composite conditions, notification destinations, and cost structure with the same production-like observability data.

OpenObserve transforms the fundamental question from "can we afford to alert on this?" to "what do we need to monitor?" The platform provides comprehensive alerting without cost-driven compromises.


This is Part 5 in a series comparing DataDog and OpenObserve for observability:

TL;DR: Key Findings

  • Alert Querying:Datadog requires learning proprietary, signal-specific syntax, whereas OpenObserve uses standard SQL and PromQL for all telemetry, eliminating vendor lock-in.
  • Alert Execution: OpenObserve triggers instant stream alerts before storage, while Datadog log alerts suffer from indexing lag and a restrictive 2-day rolling window limit.
  • Alert Destination: Datadog focuses on human-led governance through Case Management; OpenObserve prioritizes machine-led remediation via native Python Actions and Jinja2 templates.
  • Pricing: Datadog’s tiered "Metric Tax" creates cost anxiety; OpenObserve provides budget predictability with a flat $0.30/GB rate and unlimited alerts.
  • Alert Correlation: DataDog's Watchdog AI provides powerful anomaly detection but requires manual incident declaration or rule-based case and incident creation. OpenObserve automatically correlates related alerts into incidents based on configurable rules, reducing noise from the start.
  • RCA: Datadog Notebooks are built for manual post-mortems; OpenObserve RCA is built for automated discovery

What We Tested

We configured identical alert scenarios covering standard operational monitoring: high error rates, elevated latency thresholds, resource exhaustion, anomaly detection, and composite multi-service failures using the OpenTelemetry Astronomy Shop demo.

All services were instrumented with OpenTelemetry SDKs sending logs, metrics, and traces to the OTel Collector, which exported to both DataDog and OpenObserve simultaneously. Same data, same timestamps, same volumes. We then created equivalent alerts in both platforms to trigger on identical conditions and measured alert creation complexity, notification delivery, incident correlation, and root cause analysis workflows.

Alert Querying: Proprietary DSL vs. Unified SQL

Monitoring for incidents requires a query language that can accurately isolate failures. Datadog uses a specialized monitoring syntax, while OpenObserve uses the same languages you use for exploration: SQL and PromQL.

Datadog alerts (Monitors) are built using a proprietary tag-based syntax. When you define an alert, you are essentially creating a time-series query that follows a specific function:metric{tags}by{group} structure.

  • Logic Builder: Most users start with the UI dropdowns, which then generate a string like: avg(last_5m):avg:system.cpu.idle{host:web-server} by {host} > 90
  • Log-to-Monitor: For logs, Datadog uses Facets. You must first "index" a field as a facet (a manual administrative step) before you can alert on it or aggregate it.
  • The Learning Curve: Because this syntax is unique to Datadog, your team must learn vendor-specific functions (like .rollup(), .as_count(), or .moving_avg()) to handle common monitoring tasks. Datadog DSL for Metrics Monitor Datadog DSL and Query Builder for Log Monitor

OpenObserve simplifies the workflow by using standard languages like SQL/PromQL. If you can find a problem in the search bar, you have already written the alert query.

  • Quick Mode (UI Builder): Similar to Datadog, you can build conditions (e.g., status_code >= 500) using simple dropdowns and boolean logic (AND/OR). No query knowledge is required for 80% of use cases.
  • SQL Mode (Advanced): For complex logic, you can switch to full SQL. This allows for powerful operations that are difficult in proprietary DSLs:
    • Joins: Alert when a frontend error count correlates with a database latency spike.
    • Subqueries: Calculate the percentage of errors relative to total traffic in a single query.
  • PromQL Mode: If you are migrating from Prometheus, you can copy-paste your existing alerts. OpenObserve is fully PromQL-compatible for all metrics.

OpenObserve PromQL support for Metrics Alert

Scheduled vs. Real-Time Alerts

Monitoring for incidents requires a balance between immediate detection of critical failures and long-term analysis of performance trends.

1. Real-Time Alerts: Stream Evaluation vs. Indexing Costs

  • Datadog: To alert on logs, the data must first be ingested, indexed, and "facetted." This process is highly reliable but comes with a cost-per-million indexed logs. For high-volume environments, you often have to choose which logs to index to keep costs down, potentially creating blind spots in your alerting.
  • OpenObserve: Utilizes Stream Alerting to evaluate data as it arrives. This allows you to trigger critical alerts on the full data stream without needing to index every single log line first, significantly reducing the cost of real-time security and crash monitoring.

2. Analysis & Trends: The Flexibility of SQL

  • Datadog: Best for periodic audits with granular calendar-based scheduling (e.g., "Check every Monday at 9 AM"). While most real-world incidents are captured within its standard rolling windows, performing deeper historical analysis (e.g., comparing today’s error rates to a 30-day baseline) often requires converting logs into Custom Metrics. This adds a layer of configuration complexity and separate billing metrics. Source : Datadog LogMonitor Datadog Rolling window limit for Logs Monitor

  • OpenObserve: Built on a high-performance storage architecture that supports full SQL. This allows for sophisticated trend monitoring over any time horizon 7, 30, or 90 days, without reconfiguring your data. You can use SQL joins to calculate complex error rates or compare current performance against historical baselines directly within the alert query. o2-rolling-window.png

Alert Destinations: Managed Routing vs. Programmable Actions

Datadog uses a sophisticated "Notification Rules" engine to handle complex organizational structures.

  • Notification Rules: Instead of tagging every monitor with a recipient, you define central rules (e.g., "All Critical logs for the 'Checkout' service go to the #on-call-payments Slack"). This prevents "configuration drift" across thousands of monitors.
  • Case Management: Alerts don't just send a message; they can automatically open a Case. This creates a persistent ticket within Datadog where teams can collaborate, upload graphs, and track the "ownership" of an issue from start to finish. Source : Datadog Automatic Case Creation

Automatic Case Creation : Datadog OpenObserve treats the alert destination as a programmable "event" rather than just a message, , allowing for a "Self-Healing" infrastructure.

  • Python Actions (Remediation): A standout feature is the ability to trigger Python scripts directly as a destination. When an alert fires, it can execute an "Action" to auto-remediate, such as clearing a full disk, restarting a hung container, or updating a firewall rule. Learn more.

Actions in OpenObserve

  • Custom Templates (Jinja2): O2 uses the Jinja2 templating engine for all destinations. This means you can write logic inside your Slack or Email notification (e.g., "If the error count is > 500, include a 'Panic' button link, otherwise include a 'View Logs' link").

Cost: Alert Proliferation vs. Flat Pricing

When expanding your monitoring, the pricing model often dictates your technical strategy. Here is how the cost of alerting differs between the two.

In Datadog, alerting costs are largely hidden within the Custom Metrics billing. You don't pay "per alert," but you pay for the "right to alert" on non-standard data. Teams often experience "cost anxiety," where engineers hesitate to add a new tag or alert for fear of triggering a new pricing tier.

  • Custom Metric Pricing: Standard plans include a limited allotment. Beyond that, you pay $5.00 per 100 custom metrics per month.
  • The Cardinality Trap: Because Datadog charges per unique combination of tags (host, container_id, user_id), a single alert on a high-cardinality metric can generate thousands of "custom metrics," leading to massive overage bills.

Source: Datadog Custom Metric. You can refer to metrics cost breakdown here

Datadog Cost and Usage Dashboard

OpenObserve uses a unified pricing model where alerting is a core feature, not an add-on or a hidden metric cost.

  • Flat Ingestion Pricing: You pay a predictable $0.30 per GB for ingestion. This covers logs, metrics, and traces.
  • No "Custom" Distinction: There is no separate category for "custom" metrics. Whether a metric comes from standard OTel instrumentation or a custom business logic script, the price remains $0.30/GB.
  • Unlimited Alerting: You can create 5 alerts or 5,000 alerts on the same data without the price changing by a single cent.

Incident Management: From Alerts to Resolution

When multiple alerts fire simultaneously, the difference between platforms isn't just notification delivery - it's whether alerts automatically group into incidents or require manual correlation and declaration.

Alert Correlation & Incident Grouping

DataDog uses Watchdog AI for anomaly detection and provides incident management as a separate workflow:

  • Watchdog Insights: AI detects anomalies across metrics, logs, and traces
  • Alert Grouping: Related monitors can trigger together, but they remain separate alerts
  • Manual Incident Declaration: Engineers must manually declare an incident to start formal tracking , can be rule-based declaration as well.
  • Case Management: Once declared, incidents move into a separate case management workflow

The alert-to-incident flow: Multiple monitors trigger → Engineer sees separate alerts → Declare incident → Incident tracking begins

Watchdog excels at detecting unusual patterns, but connecting related alerts into a unified incident requires human decision-making.

Source: DataDog Incident Management

OpenObserve's Incident Correlation System automatically groups related alerts into incidents:

  • Automatic Grouping: Related alerts merge into incident groups without manual declaration
    Incident Correlation System
  • Configurable Rules: Define correlation logic based on timing, services, error patterns, and labels
  • Noise Reduction: 50 alerts for one database failure appear as a single incident

Example: Database connection pool exhausted

  1. Alert 1: checkout service latency > 2000ms (3:15 AM)
  2. Alert 2: payment service errors > 50/min (3:16 AM)
  3. Alert 3: PostgreSQL connections > 95% (3:17 AM)

Correlation engine automatically identifies these as related (same time window, shared database dependency, error pattern match) and creates one incident group instead of three separate alerts.

You can configure specific Monitors to automatically trigger the creation of an "Incident" or "Case" based on the severity level. The distinction is that OpenObserve’s correlation is more "algorithmic" across different signals, while DataDog’s is more "rule-based" per monitor.

Root Cause Analysis: Investigation Workflow

DataDog uses Notebooks for incident investigation and documentation:

  • Manual Timeline Building: Pull metric snapshots, log samples, and APM traces into a notebook
  • Collaborative Documentation: Team members add findings, graphs, and analysis
  • Post-Mortem Focus: Designed for writing detailed incident reports after resolution

Source: DataDog Notebooks

Watchdog RCA: Specifically pinpoints the "Origin Service" of an error. If Service A is slow because Service B's database is locked, Watchdog will point to Service B. Even with Watchdog, the final source of truth in Datadog is a Notebook.

DataDog Notebook Template

OpenObserve generates Root Cause Analysis reports automatically for incident groups. Automatic RCA Reports Include:

  • Initial trigger alert and timeline of related alerts
  • Log pattern analysis showing what changed
  • Query results from all related alerts (trace IDs, error messages, affected users)
  • Service dependency analysis
  • Historical pattern matching

Root Cause Analysis Report: OpenObserve

Quick Comparison: DataDog vs. OpenObserve Alerts

Feature DataDog OpenObserve
Query Language Proprietary syntax. Requires specialized training for each signal. Standard SQL/PromQL. Works with existing skills; no vendor lock-in.
Log Alerting Live Tail is fast, but most monitors still require indexing. Alerts depend on indexing, causing lag and higher costs. Alerts trigger during ingestion.
Time Horizon Short windows. Log alerts often limited to a 2-day rolling window. Query 7, 30, or 90 days of history with no extra config.
Remediation Human-centric. Alerts open tickets (Cases) for manual follow-up. Machine-centric. Native Python scripts auto-remediate issues (Self-healing).
Pricing "Metric Tax." $5/100 custom metrics. Rounding up increases costs. Flat $0.30/GB. One price for all data; unlimited alerts included.
Correlation Manual/Rule-based. Algorithmic. Automatically groups related alerts into a single incident.
RCA Manual. Engineers build post-mortems in notebooks based on Watchdog analysis Automated. Generates Root Cause reports with log pattern analysis instantly.

The Bottom Line

DataDog provides mature alerting with extensive integrations, automatic anomaly detection through Watchdog AI, and sophisticated workflow automation. If you're already invested in the DataDog ecosystem and cost isnt something of your concern, the alerting capabilities work well.

But if you're evaluating observability platforms or open-source DataDog alternatives for alerting, OpenObserve delivers comprehensive alerting capabilities with significant operational advantages:

  1. Unified SQL alerts across logs, metrics, and traces: one query language instead of learning proprietary monitor syntax per signal type
  2. Automatic incident correlation: related alerts group into incidents without manual declaration, reducing noise from 50 alerts to one incident
  3. Rich notification context: full query results in notifications including sample logs, trace IDs, and affected users - not just predefined fields
  4. Python-based remediation: auto-healing infrastructure through programmable actions instead of just notifications
  5. Automated RCA with log patterns: identify root cause in seconds through pattern frequency analysis instead of manual log searching
  6. No per-alert or per-custom-metric charges: comprehensive alerting without cost anxiety

For platform engineers managing OpenTelemetry-instrumented microservices, these differences matter. No hesitation before alerting on custom metrics. Complex multi-condition alerts using SQL joins without managing multiple monitors. Incident correlation that automatically connects related failures. Transparent pricing that scales predictably.

The 60-90% cost savings teams achieve with OpenObserve extends to alerting - alert on any metric without incremental charges, enabling the comprehensive monitoring coverage production systems require.



Sign up for a free cloud trial or schedule a demo to test OpenObserve dashboards with your observability data.

About the Author

Simran Kumari

Simran Kumari

LinkedIn

Passionate about observability, AI systems, and cloud-native tools. All in on DevOps and improving the developer experience.

Latest From Our Blogs

View all posts