What You Need to Know About Prometheus Metrics: Architecture, Collection, and Optimization for Scalable Observability

Chaitanya Sistla

November 12, 2024

11 min read

Don’t forget to share!

Table of Contents

Monitoring and observability have become critical aspects of modern DevOps and SRE practices. Prometheus, one of the most popular open-source monitoring solutions, has proven invaluable in enabling real-time monitoring, alerting, and data visualization. In this guide, we’ll explore the full workflow of Prometheus metrics, from setting up Prometheus to ingesting data, processing it, and visualizing it. By the end of this guide, you’ll have a clear understanding of how to integrate and leverage Prometheus metrics for observability in your system.

1. Prometheus Architecture

To understand Prometheus fully, it’s essential to explore its architecture and how each component contributes to its functionality.

1.1 Core Components

Prometheus Server: The main server that scrapes and stores metrics, processes rules, and runs queries.
Exporters: Small programs that expose metrics from external sources, like the OS or databases, in a Prometheus-readable format.
Alertmanager: Handles alerts generated by Prometheus rules and routes them to various receivers (email and Slack).
Service Discovery: Helps Prometheus locate and add monitoring targets dynamically in environments with dynamic infrastructure, like Kubernetes.

1.2 Data Flow Overview

Prometheus collects metrics by scraping endpoints at configured intervals. The data is then stored in a time-series database, which supports a variety of operations, including aggregations and mathematical computations. Prometheus uses PromQL to query stored data and analyze system performance.

2. Setting Up Metrics Collection and Exporters

In Prometheus, exporters are used to gather metrics from various sources, such as system hardware, applications, and databases, exposing them in a format that Prometheus can read and scrape.

2.1 Installing Node Exporter for System Metrics

Node Exporter is commonly used to gather system-level metrics like CPU, memory, and disk usage. Here’s how to set it up:

Download and Install Node Exporter:
To find the available releases, go to https://github.com/prometheus/node_exporter/releases

wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
tar -xvf node_exporter-1.2.2.linux-amd64.tar.gz
cd node_exporter-1.2.2.linux-amd64

Run Node Exporter:

./node_exporter

Node Exporter, by default, exposes metrics at localhost:9100/metrics

node exporter

Configure Prometheus to Scrape Node Exporter:
Create prometheus.yml to include Node Exporter as a scrape target in the same directory:

global:
  scrape_interval: 15s  # Set the default scrape interval

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

This configuration instructs Prometheus to scrape metrics from Node Exporter at localhost:9100.

2.2 Custom Application Metrics

To monitor specific aspects of an application’s performance, you can instrument custom metrics within your application. Below is an example using Python.

Install the Prometheus Client Library:

pip install prometheus_client

Create a Simple Python Script to Expose Metrics:
This script exposes two types of metrics: a counter for the number of requests and a gauge for request latency.

from prometheus_client import start_http_server, Counter, Gauge
import time
import random

REQUEST_COUNTER = Counter('app_request_count', 'Number of requests received')
REQUEST_LATENCY = Gauge('app_request_latency_seconds', 'Latency of requests in seconds')

def process_request():
    REQUEST_COUNTER.inc()
    with REQUEST_LATENCY.time():
        time.sleep(random.uniform(0.1, 1.0))

if __name__ == '__main__':
    start_http_server(8000)  # Start a Prometheus metrics endpoint
    while True:
        process_request()

Run the Script:

python your_script.py

Add the Application as a Scrape Target in Prometheus:
Update prometheus.yml to include your application as a scrape target

scrape_configs:
  - job_name: 'my_python_app'
    static_configs:
      - targets: ['localhost:8000']

3. Ingesting Prometheus Metrics into OpenObserve

Why Ingest Prometheus Metrics into OpenObserve?

Prometheus is perfect for quick, real-time metrics, but as systems grow, storing, scaling, and analyzing long-term data becomes more challenging. OpenObserve steps in as a powerful companion, allowing you to keep Prometheus metrics over a longer period and scale effortlessly without complex setups or storage limitations. By sending metrics from Prometheus to OpenObserve, you retain the flexibility of Prometheus for instant monitoring while gaining a scalable backend for deeper, historical insights and advanced analytics.

With OpenObserve, your observability stack is ready to scale alongside your infrastructure, ensuring smooth, reliable performance as your systems grow.

prometheus o2

Once your system and application metrics are configured, you can set up Remote Write in Prometheus to send these metrics directly to OpenObserve for centralized visualization and long-term storage.

3.1 Configure Remote Write to OpenObserve

prometheus o2

To send data to OpenObserve, add a remote_write section in prometheus.yml:

remote_write:
  - url: https://<openobserve_host>/api/<org_name>/prometheus/api/v1/write
    queue_config:
      max_samples_per_send: 10000
    basic_auth:
      username: <openobserve_user>
      password: <openobserve_password>

url: This specifies the OpenObserve endpoint where Prometheus sends metrics data.
queue_config: Configures the batch size (max_samples_per_send) of metrics sent to OpenObserve, helping manage throughput.
basic_auth: Provides secure access to OpenObserve, ensuring only authorized users send data.

3.2 Testing the Remote Write Configuration

Install Prometheus to configure and scrape the endpoints:

wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
tar -xvf prometheus-2.30.3.linux-amd64.tar.gz
cd prometheus-2.30.3.linux-amd64

Start Prometheus to apply the new configuration:

./prometheus --config.file=prometheus.yml

Verify Metrics Ingestion in OpenObserve: Log into OpenObserve’s dashboard, and confirm that it’s receiving Prometheus data. You should see metrics from Node Exporter and your custom application populating the OpenObserve dashboard.

4. Visualizing Metrics Directly in OpenObserve

With metrics ingested into OpenObserve, you can now use its visualization tools to create insightful dashboards and analyze data. Here’s a step-by-step guide for setting up and customizing these visualizations.

4.1 Setting Up a New Dashboard

Create a Dashboard:
- In OpenObserve, navigate to the Dashboards section.
- Select Create New Dashboard and give it a meaningful name like “System and Application Metrics.”
Add Panels for Key Metrics:
- You can add various panels for specific metrics (e.g., CPU usage, memory, application request count).

4.2 Example Panels for Common Metrics

Here are a few examples of commonly used panels with corresponding queries.

dashboards

Total Application Requests:

Metric: app_request_count
Visualization: Select a line or bar chart to show the count over time.

Average Request Latency:
Metric: app_request_latency_seconds
Query: Use an aggregation function to show average latency over time.
Visualization: Use a gauge or time series chart.

System CPU Usage:
Metric: node_cpu_seconds_total
Query: rate(node_cpu_seconds_total[5m]) by (instance)
Visualization: Use a line chart to show CPU usage trends by instance.

Memory Utilization:
Metric: node_memory_MemAvailable_bytes
Query: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
Visualization: Display memory usage over time with a line chart.

Disk Usage:
Metric: node_filesystem_free_bytes
Query: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes
Description: Monitors disk usage as a percentage of the total available disk space. This is essential for tracking storage capacity and avoiding potential disk saturation.
Visualization: Use a line or area chart to track disk usage over time, with critical usage thresholds highlighted.

Network I/O:
Metric: node_network_receive_bytes_total and node_network_transmit_bytes_total
Query:
- Receive: rate(node_network_receive_bytes_total[5m])
- Transmit: rate(node_network_transmit_bytes_total[5m])
Description: Tracks network traffic, showing both incoming and outgoing data in bytes per second. This helps detect network bottlenecks and monitor bandwidth usage.
Visualization: Use a dual-axis line chart or two separate line charts to distinguish between received and transmitted data.

CPU Load Average:
Metric: node_load1, node_load5, node_load15
Query: Use node_load1, node_load5, and node_load15 directly to show 1-minute, 5-minute, and 15-minute CPU load averages.
Description: Provides insight into CPU load trends over different time frames, helping to identify periods of high CPU usage and assess system load.
Visualization: Use a line chart with multiple series for each load metric to compare short-term and long-term CPU load averages.

Memory Usage:
Metric: node_memory_MemAvailable_bytes and node_memory_MemTotal_bytes
Query: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes
Description: Tracks the percentage of memory currently in use. High memory usage over time can indicate the need for additional resources or optimizations.
Visualization: Use a line chart or gauge to show memory usage over time, with thresholds for low, moderate, and high memory usage.

System Context Switches:
Metric: node_context_switches_total
Query: rate(node_context_switches_total[5m])
Description: Counts the rate of context switches per second, which can help monitor CPU scheduling. A high rate of context switches may indicate heavy multitasking or performance bottlenecks.
Visualization: Use a line chart to track context switches over time, identifying any unusual spikes that could signify performance issues.

System Uptime:
Metric: node_time_seconds and node_boot_time_seconds
Query: node_time_seconds - node_boot_time_seconds
Description: Calculates the node's uptime by subtracting the boot time from the current system time. Useful for tracking system reliability and uptime compliance.
Visualization: Use a single-value chart showing total uptime in hours, days, or weeks, depending on the length of operation.

To make it easier to set up, here is an attached JSON file that you can import directly into your OpenObserve dashboard. This file includes pre-configured panels for each of the metrics described above, allowing you to get started with node-level monitoring quickly and efficiently.

How to Import Your Dashboard to OpenObserve:

Download the JSON file to your local system.
In OpenObserve, navigate to Dashboards and select Import.
Upload the JSON file, and OpenObserve will automatically configure the panels and visualizations.

This will set up a complete node-level monitoring dashboard with metrics ready to go! Let me know if you need the JSON file tailored further.

import dashboard

4.3 Configuring Alerts in OpenObserve

OpenObserve supports alerts, enabling you to set thresholds on critical metrics. For example, you can set an alert for high memory usage:

Create a New Alert:
- In OpenObserve, go to the Alerts section.
- Configure a new alert rule based on the node_memory_MemAvailable_bytes metric to monitor available memory.
Define Alert Conditions:
- Set conditions, such as alerting if memory usage is below a specified threshold for an extended period.
Choose Notification Channels:
- Configure the destination (that was set up for Slack or Email) to ensure you’re alerted promptly.

alerts

5. Optimizing Prometheus Metrics for Scalable, Insightful Observability

prometheus o2

Optimizing your Prometheus setup is essential for efficient metrics management and powerful monitoring of application and infrastructure health. By implementing best practices like tuning scrape intervals, managing label cardinality, and tracking Prometheus health metrics, you ensure scalable and insightful observability. This approach helps maintain system stability and supports proactive improvements, making your Prometheus monitoring both effective and future-ready.

Best Practice	Description	Benefit
Optimize Scrape Intervals	Set appropriate scrape intervals to balance data granularity with system load. Adjust intervals based on the metric’s importance and frequency of change.	Reduces system load, avoids data overload, and maintains relevant metrics without excessive detail.
Manage Label Cardinality	Limit the number of unique label combinations (cardinality) to prevent excessive memory and CPU use. Avoid using high-cardinality labels (e.g., UUIDs).	Enhances performance, reduces memory usage, and prevents excessive data ingestion costs.
Use Remote Write for Long-Term Storage	Configure Prometheus to send data to OpenObserve using Remote Write, ensuring that older metrics are stored efficiently outside of Prometheus’ local storage.	Extends data retention, reduces local storage pressure, and enables long-term trend analysis.
Implement Recording Rules	Define recording rules for frequently queried metrics, and precomputing results to avoid redundant calculations at query time.	Speeds up query performance, reduces load on Prometheus, and improves user experience.
Monitor Prometheus Health Metrics	Track Prometheus’s own health metrics (e.g., memory usage, CPU, scrape duration) to proactively manage and scale the Prometheus instance as needed.	Prevents performance bottlenecks, enables proactive troubleshooting, and ensures reliable monitoring.
Centralize Metrics in OpenObserve	Aggregate and visualize metrics in OpenObserve, allowing for enhanced analytics, dashboards, and alerting across Prometheus, Node Exporter, and other sources.	Provides a centralized observability platform, improving insights and simplifying management tasks.
Automate Alerts and Notifications	Set up alerts for key metrics and system performance thresholds to catch issues early, preventing downtime or degradation.	Enhances response time, prevents downtime, and supports proactive system management.
Balance Retention Policies with Data Needs	Adjust Prometheus data retention based on operational needs and data utility, ensuring only necessary data is retained.	Optimizes storage costs, maintains data relevancy, and reduces unnecessary data accumulation.

Get Started with OpenObserve for Effortless Metrics Management

Ready to take your observability to the next level? OpenObserve offers a seamless platform for visualizing and storing Prometheus metrics long-term, all in one place. Start your journey with OpenObserve today to centralize your metrics, streamline data retention, and enhance your monitoring capabilities.

Get started with OpenObserve now and unlock powerful insights into your systems!

About the Author

Chaitanya Sistla

Chaitanya Sistla is a Principal Solutions Architect with 17X certifications across Cloud, Data, DevOps, and Cybersecurity. Leveraging extensive startup experience and a focus on MLOps, Chaitanya excels at designing scalable, innovative solutions that drive operational excellence and business transformation.

Latest From Our Blogs

View all posts

How to

Logging

Log Searching and Filtering

Search, filter, and analyze logs efficiently to uncover insights faster. Explore advanced log search techniques, query filters, and best practices for managing large-scale log data

Simran Kumari

2025-10-29

Monitoring OpenObserve: From Infrastructure Health to Internal Metrics

How to

OpenObserveObservability

Monitoring OpenObserve: From Infrastructure Health to Internal Metrics

Learn how to effectively monitor your OpenObserve deployment, starting from system health metrics like CPU, memory, and network usage, to OpenObserve’s own internal metrics exposed in Prometheus format. This guide walks through best practices, setup steps, and configuration examples.

Manas Sharma

2025-10-28

How to

DashboardsKubernetesObservability

How to Import Prebuilt Kubernetes (K8s) Dashboards in OpenObserve

Import OpenObserve’s prebuilt Kubernetes dashboards for end-to-end visibility, trend analysis, and proactive alerts across clusters.

Simran Kumari

2025-10-27

Top 10 Observability Platforms in 2025: A Practical Comparison for Modern Teams

Engineering

ComparisonsOpenObserveObservability

Top 10 Observability Platforms in 2025: A Practical Comparison for Modern Teams

A comprehensive comparison of the top 10 observability platforms in 2025 highlighting their strengths, trade-offs, and use-cases.

Manas Sharma

2025-10-23

Top 10 Open Source Observability Tools in 2025

Engineering

ObservabilityOpenObserveComparisons

Top 10 Open Source Observability Tools in 2025

A comprehensive comparison of the top 10 open source observability platforms in 2025 highlighting their strengths, trade-offs, and use-cases.

Simran Kumari

2025-10-23

Top 10 Open Source Monitoring Tools in 2025: Why They Matter + What to Choose

Engineering

ComparisonsMetricsObservability

Top 10 Open Source Monitoring Tools in 2025: Why They Matter + What to Choose

Top 10 open source monitoring tools explained. Learn what they are, why they matter, and how to choose to get the most from your observability strategy.

Simran Kumari

2025-10-23

How to

LoggingOpentelemetryOpenObserve

How to Ingest Multi-Line Events in OpenObserve Using OpenObserve Collector

Discover how to effectively handle multi-line log events, such as application stack traces, using OpenObserve Collector. This comprehensive guide covers configuring the filelog receiver, deploying changes with Helm, and verifying proper ingestion in OpenObserve. Improve log readability and troubleshooting with practical examples and solutions.

Md Mosaraf

2025-10-17

OpenTelemetry Astronomy Shop Demo App with OpenObserve Integration

Engineering

OpentelemetryTracingLogging

OpenTelemetry Astronomy Shop Demo App with OpenObserve Integration

Deploy the OpenTelemetry Astronomy Shop demo and stream traces, metrics, and logs to OpenObserve with Helm. Easily configure OTLP and visualize everything in dashboards.

Chaitanya Sistla

2025-10-13