The Prometheus Cardinality Bomb: How to Prevent It Before It Blows Up

Simran Kumari
Simran Kumari
March 17, 2026
12 min read
Don’t forget to share!
TwitterLinkedInFacebook

Ready to get started?

Try OpenObserve Cloud today for more efficient and performant observability.

Get Started For Free
Table of Contents
PROMETHEUS.png

The alert fires at 2:17 AM. Grafana dashboards are blank. Prometheus is OOM-killed and won't stay up. Queries that used to run in milliseconds are now timing out at 30 seconds , or not returning at all. The on-call engineer opens the runbook, finds nothing useful, and starts restarting pods.

Three hours later, after a war room, a rollback, and a lot of coffee, someone finds the root cause: a single line of instrumentation code, merged three months ago by a well-meaning developer who added a user_id label to the main request counter.

No alarms went off at the time. The metric itself looked fine. But every time a new user hit the service, Prometheus silently created a new time series. After 90 days and a million users, that one label had generated over five million time series , and the in-memory TSDB had finally buckled under the weight.

This is the cardinality bomb. It doesn't detonate the moment you pull the pin. It waits.

What Is Cardinality? The Math That Matters

To understand why this happens, you need to understand how Prometheus stores data.

Prometheus is a time-series database. It doesn't store a single "counter" , it stores one independent time series per unique combination of label values. Every label you attach to a metric multiplies the number of time series Prometheus must track, index, and hold in memory.

Here's the math in plain English:

http_requests_total{environment="prod", service="checkout", status_code="200"}  → 1 series
http_requests_total{environment="prod", service="checkout", status_code="404"}  → 1 series
http_requests_total{environment="prod", service="payments", status_code="200"}  → 1 series
...and so on

A metric with 3 environments × 5 services × 10 status codes yields 150 time series , entirely manageable.

Now add user_id to that same metric with 1 million unique users:

3 environments × 5 services × 10 status_codes × 1,000,000 user_ids = 150,000,000 series

That's 150 million time series , each one occupying RAM in Prometheus's TSDB. The process doesn't swap to disk gracefully; it OOMs and dies.

This is cardinality: the number of unique time series for a given metric. High cardinality = high memory pressure = instability. And the relationship is roughly linear: double the unique label values, double the memory usage.

For a deeper foundation on how Prometheus stores and scrapes this data, see What You Need to Know About Prometheus Architecture.

The Labels You Should Never Use

Not all labels are created equal. Some labels are bounded , their set of possible values is small and stable (a handful of environments, a known list of HTTP status codes). Others are unbounded , new values arrive continuously and there's no ceiling on how many can appear.

Unbounded labels are cardinality bombs. Here are the most common offenders:

High-Cardinality Anti-Patterns

Label Why It's Dangerous
user_id One new series per user. At scale, this is millions of series.
session_id Even more volatile , sessions expire but the time series persist until retention cutoff.
request_id / trace_id Unique per request by design. A high-traffic API generates thousands per second.
ip_address Unbounded by nature. Especially dangerous in public-facing APIs.
url_path (raw) Paths with dynamic segments like /users/12345/orders explode into one series per path permutation.
container_id / pod_hash Container runtimes rotate these constantly. Every new deploy floods Prometheus with fresh series.
error_message (raw) Error strings often contain dynamic content (timestamps, IDs, filenames).

The Rule of Thumb

If the number of unique values for a label can grow without a defined ceiling, it is not a label. It is a trace attribute.

A status_code label is fine: HTTP gives you roughly 60 defined codes and you'll realistically see fewer than 15. A user_id label is not fine: it scales with your user base and never stops growing.

When in doubt, ask: "Could this label have 10,000 unique values in production?" If the answer is yes, it belongs in a trace span, not a metric label.

For more on Prometheus metric types and when to use counters vs. gauges vs. histograms, see Prometheus Metric Types (Counters, Gauges, Histograms, Summaries).

What Should Be a Label vs. a Trace Attribute

The core architectural insight behind preventing cardinality explosions is this: metrics and traces serve fundamentally different purposes, and mixing their data models is the source of most cardinality mistakes.

Metrics Are for Aggregation

Metrics answer questions about system behavior in aggregate:

  • What is the 99th percentile latency of the checkout service?
  • How many 5xx errors did the payments service return in the last 5 minutes?
  • What is the current memory utilization across all pods in the prod namespace?

These questions are answered by bounded, low-cardinality dimensions , a fixed set of services, environments, HTTP status codes, and so on. The power of metrics is that they give you instant, pre-aggregated answers across your entire fleet without scanning raw events.

Traces Are for Investigation

Traces answer questions about specific request instances:

  • What happened to this particular user's checkout request?
  • Which downstream service caused this specific slow transaction?
  • What was the exact SQL query that took 4 seconds for this request_id?

These questions require high-cardinality identifiers , user IDs, request IDs, session IDs, trace IDs , because you're looking at individual events, not aggregations. Trace backends are designed for exactly this: indexing and retrieving individual spans by arbitrary attribute values.

If you need to answer "what happened to user 12345's request," open a trace. If you need to answer "what is the error rate for the checkout service," query a metric. These are different tools built for different jobs, and conflating them breaks both.

For a practical guide to implementing distributed tracing with OpenTelemetry and sending high-cardinality span attributes to a trace backend, see A Comprehensive Guide to Distributed Tracing: From Basics to Beyond.

To understand how logs, metrics, and traces work together as a unified observability system, see Full-Stack Observability: Connecting Logs, Metrics, and Traces.

How to Detect and Fix Existing Cardinality Issues

If you suspect a cardinality problem is already underway , or want to build a cardinality dashboard before one occurs , here's a step-by-step playbook.

Step 1: Check Your Total Series Count

Start by checking the current number of active time series in your TSDB head:

prometheus_tsdb_head_series

A healthy Prometheus instance for a mid-sized production environment typically sits between 100,000 and 2,000,000 series. If you're north of 5 million, you likely have a cardinality problem. If you're north of 10 million, it's urgent.

Step 2: Find Your Worst Offenders

Use this query to list every metric name sorted by its series count, descending:

# Series count per metric name , your cardinality leaderboard
sort_desc(
  count by (__name__) ({__name__=~".+"})
)

⚠️ Warning: This query is expensive. Run it during off-peak hours or with a short timeout. On a large instance, it may itself cause performance issues.

For a lighter-weight alternative that targets known problem areas:

# Count series for a specific metric , useful when you suspect a culprit
count(http_requests_total) by (user_id)

Step 3: Identify the Exploding Label

Once you've found a high-cardinality metric, identify which label is responsible:

# How many unique values does each label have for this metric?
count(count by (user_id) (http_requests_total))
count(count by (status_code) (http_requests_total))
count(count by (service_name) (http_requests_total))

The label with the largest count is your bomb.

For more on counting unique series and understanding cardinality metrics in Prometheus, see Prometheus Metrics Count Basics.

Step 4: Add a Recording Rule to Pre-Aggregate

If you need to preserve some data from a high-cardinality metric while you plan a proper fix, recording rules let you pre-aggregate the expensive metric into a cheaper derived one. Add this to your rules.yml:

groups:
  - name: cardinality_control
    interval: 1m
    rules:
      # Aggregate away user_id , keep only the dimensions you actually alert on
      - record: http_requests_total:by_service_and_status
        expr: sum by (service_name, status_code, environment) (http_requests_total)

This creates a new metric with manageable cardinality. You can then alert on the recording rule output while you remove the problematic label from your instrumentation.

Step 5: Drop the Label at the Collector

The Surgical Fix: Target One Metric, One Label

To drop user_id only from the http_requests_total metric without affecting other metrics, use this pattern in your prometheus.yml:

scrape_configs:
  - job_name: "my-service"
    static_configs:
      - targets: ["my-service:8080"]
    
    metric_relabel_configs:
      # STEP 1: Identify the "bomb" metric and the label to drop.
      # We use the semicolon separator (default) to match Name;LabelValue
      - source_labels: [__name__, user_id]
        regex: 'http_requests_total;(.+)'
        target_label: user_id
        replacement: '' # Setting to empty string effectively removes it from the unique index
        action: replace

      # STEP 2: Optional - Cleanly remove the label key entirely from storage
      - source_labels: [user_id]
        regex: '^$' # Matches the empty value we just set
        action: labeldrop

The "Nuclear" Option: Global Label Drop

If you know that user_id provides zero value across your entire job and you want it gone from every single metric to save maximum RAM, use this simpler (but more aggressive) rule:

    metric_relabel_configs:
      - action: labeldrop
        regex: 'user_id' # Removes this label from EVERY metric in this scrape job

⚠️ Warning: labeldrop is irreversible at the ingestion point. Once you drop it here, the data is gone. You cannot "un-drop" it later to see which user caused a specific error. This is why high-cardinality data belongs in Traces (OpenTelemetry) or Columnar Backends (OpenObserve) where it can be stored cheaply.

Step 6: Monitor Cardinality Continuously

Add this alert to your Prometheus alerting rules to catch future cardinality growth before it becomes an outage:

groups:
  - name: cardinality_alerts
    rules:
      - alert: HighCardinalityMetric
        expr: |
          sort_desc(count by (__name__) ({__name__=~".+"})) > 500000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Metric {{ $labels.__name__ }} has {{ $value }} time series"
          description: >
            A single metric has exceeded 500k time series.
            Investigate label cardinality immediately.

      - alert: PrometheusSeriesCountCritical
        expr: prometheus_tsdb_head_series > 8000000
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Prometheus TSDB series count is critically high ({{ $value }})"

When to Move Beyond Prometheus

Prometheus's in-memory TSDB model is one of its greatest strengths: it makes PromQL blazing fast for recent data across a bounded set of time series. But it is also its fundamental constraint. Every active time series must fit in RAM. There is no overflow, no columnar spill-to-disk, no dynamic sharding. You either fit, or you OOM (Out of Memory).

For teams that have hit this ceiling , or are designing systems where high-cardinality metrics are unavoidable (SaaS platforms with per-tenant metrics, edge networks with per-node telemetry, platforms tracking thousands of dynamic endpoints) , the architectural answer is a backend that doesn't share Prometheus's in-memory constraint.

Why Columnar Backends Handle Cardinality Differently

Prometheus stores each time series as an independent in-memory stream. The moment a new label combination appears, a new stream is allocated. This is optimal for fast range queries over a known, stable set of series , but it makes cardinality a first-class resource problem.

Columnar storage backends (like the one used by OpenObserve) store metric data as columns in compressed files on object storage (S3, GCS, or similar). Rather than allocating a new data structure per unique label combination, data is written in bulk and queried by scanning compressed columns. There is no per-series memory allocation at ingest time.

The practical consequences:

  • No cardinality limit at ingest. You can add user_id to a metric and the backend doesn't OOM , it just writes data. Query performance degrades gracefully with cardinality rather than catastrophically.
  • Storage costs scale with data volume, not series count. Instead of per-time-series billing or per-series memory, you pay for bytes stored , which is typically far cheaper at high cardinality.
  • PromQL still works. OpenObserve supports Prometheus remote write and full PromQL, so your existing queries, dashboards, and alerting rules work without modification.

This doesn't mean you should abandon cardinality discipline. Even in columnar backends, high-cardinality queries scan more data and cost more to execute. But it changes cardinality from an availability problem (Prometheus goes down) into a performance and cost trade-off that you can manage deliberately.

The Practical Migration Path

For most teams, the path forward is not "replace Prometheus" , it's "use Prometheus for what it's good at, and offload everything else."

Prometheus handles real-time scraping and alerting with its familiar local TSDB. OpenObserve receives the same data via remote_write and handles long-term retention, historical queries, cross-signal correlation (logs + metrics + traces), and any metrics where cardinality makes local storage impractical.

# prometheus.yml , add remote_write to OpenObserve
remote_write:
  - url: "https://<your-openobserve-host>/api/<org>/prometheus/api/v1/write"
    queue_config:
      max_samples_per_send: 10000
    basic_auth:
      username: <openobserve_user>
      password: <openobserve_password>

Summary: The Cardinality Checklist

Before adding any label to any metric, run through this checklist:

  • Is this value bounded? Can you enumerate all possible values, and is that list stable? If not, it's not a label.
  • Does this value help you aggregate? Would you ever sum by (this_label) in a PromQL query? If not, it probably belongs in a trace span.
  • Is the unique count below 1,000? A rough upper bound for a single label. If a label has 10,000+ possible values, treat it with extreme caution.
  • Have you accounted for label combinations? Each label multiplies cardinality. Three "safe" labels of 100 values each = 1,000,000 potential series.
  • Is this a per-request identifier? User IDs, session IDs, request IDs, and trace IDs all go in trace spans. No exceptions.

A single label choice made at 2 PM on a Tuesday can bring down your metrics stack at 2 AM on a Sunday. The cardinality bomb doesn't make noise when you arm it. The checklist above is how you defuse it before it's too late.

About the Author

Simran Kumari

Simran Kumari

LinkedIn

Passionate about observability, AI systems, and cloud-native tools. All in on DevOps and improving the developer experience.

Latest From Our Blogs

View all posts