Prometheus Metrics Count Basics

Prometheus is a powerful monitoring system that excels at collecting and aggregating metrics. To make sense of what it measures, Prometheus uses four main metric types:
Among these, the Counter is the backbone of “counting” in Prometheus. It’s what helps you answer essential operational questions like:
In this guide, we’ll demystify counting in Prometheus, starting from the basics and working up to practical strategies you can apply in your monitoring setup.
Before writing Prometheus queries, it’s important to know what exactly we’re counting. Prometheus data is built on two concepts: metrics and labels.
Metrics are the actual measurements you collect. Examples include:
http_requests_total
→ total number of HTTP requests node_cpu_seconds_total
→ CPU usage time db_connections
→ number of active database connectionsLabels are key–value pairs that add context to a metric. They let you slice and filter the data. For example:
method="GET"
or method="POST"
for request types status="200"
or status="500"
for response codes region="us-east"
or region="europe"
for deployment locationSo if you’re tracking http_requests_total
, labels could tell you how many requests came from us-east
, how many were 500 errors
, or how many used the POST
method.
Understanding Time Series
A time series is a sequence of data points collected over time. In Prometheus, each unique combination of a metric name and its labels creates a separate time series. For example:
website_visits{country="usa", device="mobile"}
is one time series website_visits{country="usa", device="desktop"}
is another time series website_visits{country="canada", device="mobile"}
is yet another time seriesCounting unique values in Prometheus isn't as straightforward as you might expect. Unlike traditional databases where you can simply use "COUNT DISTINCT", Prometheus is designed for time-series data and optimized for performance over complex queries.
The main challenges include:
This is why learning the proper techniques for counting is essential for effective Prometheus usage.
To understand counting better, it's helpful to know how Prometheus works behind the scenes. Prometheus scrapes (collects) metrics from your applications and infrastructure at regular intervals, typically every 15-30 seconds.
Each time it scrapes, it creates data points with timestamps. These data points are stored as time series, and each unique combination of metric name and labels becomes a separate time series. When you run a count query, Prometheus looks at all the relevant time series and performs calculations on them.
This architecture is why Prometheus is so fast at handling large amounts of time-series data, but it also explains why counting unique values requires special techniques.
count(count by (status_code) (http_requests_total))
This query works in two steps. First, count by (status_code) (http_requests_total)
groups all HTTP requests by their status code (like 200, 404, 500). Then, the outer count()
counts how many unique status codes exist. If your application returns status codes 200, 404, and 500, this query returns 3
.
The reason we use this double count approach is that the inner count groups your data, and the outer count gives you the final number. It's like organizing books by genre first, then counting how many genres you have.
Let's start with the absolute basics of counting in Prometheus. We'll build your understanding step by step.
Syntax:
count(metric_name)
Example:
count(up)
What it does: Counts how many time series exist for the up
metric (which tracks if services are running).
Use case: You want to know "How many services is Prometheus monitoring?" If you have 5 services being monitored, this returns 5
. This is the foundation of all counting in Prometheus.
Syntax:
count by (label_name) (metric_name)
Example:
count by (job) (up)
What it does: Groups the up
metric by job
label, then counts how many instances exist for each job.
Use case: You want to see "How many instances of each service am I monitoring?" This might return:
prometheus-job: 1
web-server: 3
database: 2
This shows you have 1 Prometheus instance, 3 web servers, and 2 database instances.
Syntax:
count(count by (label_name) (metric_name))
Example:
count(count by (status_code) (http_requests_total))
What it does: First groups by status_code, then counts how many unique status codes exist.
Use case: You want to know "How many different HTTP status codes am I seeing?" The inner count by
creates groups for each status code, the outer count
tells you how many groups exist. If you see 200, 404, and 500 status codes, this returns 3
.
Syntax:
count without (label_name) (metric_name)
Example:
count without (instance) (up)
What it does: Counts while ignoring the instance
label, effectively grouping by all other labels.
Use case: You want service-level counts instead of instance-level counts. Instead of counting each server separately, you count services as units.
Syntax:
count(metric_name{label="value"})
Example:
count(up{job="web-server"})
What it does: Counts only the time series where job equals "web-server".
Use case: You want to know "How many web server instances am I monitoring?" This filters out all other services and counts only web servers.
Syntax:
count by (label1, label2) (metric_name)
Example:
count by (job, instance) (up)
What it does: Groups by both job AND instance, showing the count for each job-instance combination.
Use case: You want a detailed breakdown showing each specific service instance. This gives you a complete inventory of every monitored service instance.
Syntax:
count(count by (label1, label2) (metric_name))
Example:
count(count by (method, status_code) (http_requests_total))
What it does: Counts how many unique combinations of method AND status code exist.
Use case: Now that you understand the basics, you can count complex combinations. If you have GET/POST/PUT methods across 200/404/500 status codes, this returns the number of unique combinations you're actually seeing (might be less than the theoretical maximum if some combinations don't occur).
Sometimes you only want to count data from a specific period, like the last hour or day. That’s where range functions like last_over_time()
come in.
Syntax:
count(last_over_time(<metric>[<range>]))
Example:
count(last_over_time(user_activity[1d]))
What this actually does:
Prometheus looks at each unique time-series of user_activity
. A time series is a metric + label combination, like:
user_type="premium", location="us-east"
user_type="trial", location="europe"
For each time series, it finds the last recorded value within the past day. It doesn’t care about all the earlier samples — only the most recent one.
The count()
then counts how many series had at least one data point in that period.
Analogy: Imagine you have several employees submitting daily reports. You only check the last report each employee submitted today. Then you count how many employees submitted reports. That’s exactly what this query does for metrics.
Tip: Swap [1d]
for [1h]
, [7d]
, or [30m]
depending on the period you want to analyze.
Syntax:
count_values("label_name", metric_name)
Example:
count_values("response_code", http_status)
What it does: Creates a new time series showing each unique value and its count.
Use case: When you want to see both what the unique values are AND how many times each appears. Instead of just knowing "you have 3 unique status codes," you see "200 appears 1000 times, 404 appears 50 times, 500 appears 10 times."
Syntax for Active Time Series:
prometheus_tsdb_head_series
Example result: Returns a number like 15420
, meaning Prometheus is tracking 15,420 different data streams.
Use case: Monitor Prometheus performance and storage usage. High numbers might indicate cardinality issues.
Syntax for All Metrics:
count({__name__=~'.+'})
Example result: Returns total count of all metric data points.
Use case: Get a bird's-eye view of your monitoring scope. Useful for capacity planning and understanding system complexity.
While Prometheus excels at real-time monitoring and counting, it is primarily designed for short-term data storage. Many default setups retain data for around two weeks, though this can be configured. For long-term storage, historical analysis, and advanced visualization, integrating Prometheus with a tool like OpenObserve is highly valuable. This setup allows you to maintain real-time monitoring while also keeping months or even years of historical metrics for deeper insights.
Learn more about ingesting Prometheus metrics into OpenObserve.
To send your Prometheus data to OpenObserve, add this configuration to your Prometheus config file:
remote_write:
- url: https://<openobserve_host>/api/<org_name>/prometheus/api/v1/write
queue_config:
max_samples_per_send: 10000
basic_auth:
username: <openobserve_user>
password: <openobserve_password>
This setup allows Prometheus to focus on collecting data while OpenObserve handles storage and visualization, creating a powerful monitoring combination.
count(count by (label) (metric))
[1h]
, [1d]
, etc. Prometheus counting might seem complex at first, but with these basics, you can start monitoring effectively. Focus on simple queries, build expertise over time, and prioritize metrics that matter most. For advanced visualization and long-term storage, integrate with OpenObserve to unlock the full potential of your monitoring data.
For deeper guidance, check out the OpenObserve documentation on advanced setups and custom dashboards.
Ready to put this into practice? Sign up for an OpenObserve cloud account (14 day free trial) or visit our downloads page to self-host OpenObserve.