Table of Contents

pipelines.gif

What is OpenObserve?

OpenObserve is an open-source, cloud-native observability platform designed for high-efficiency log management and monitoring. It utilizes the Parquet format for storage, ensuring optimized data compression and quick retrieval using a SQL-based query engine. Unlike traditional log management solutions that rely on costly mechanisms, OpenObserve provides a highly scalable alternative that can handle massive log volumes with minimal overhead.

Since version v0.14.0, OpenObserve introduced an optional inverted index for accelerating data queries. This enhancement allows users to retrieve logs faster without impacting the system’s efficiency.

Version Information

This blog is based on the following OpenObserve version:

Version: v0.14.5-rc3
Commit Hash: ad7708002439241a04e00af182f0fd22e9a9954f
Build Date: 2025-03-15T06:12:31Z

Introducing Pipelines in OpenObserve

Pipelines in OpenObserve were first introduced in version v0.14.0 and serve as a powerful mechanism for processing and transforming logs before they are stored. Pipelines offer three core functionalities:

  1. Real-time Data Processing (a.k.a. stream processing): Formatting, cleaning, and filtering log data to enhance usability.
  2. Dynamic Data Routing: Routing logs to different data streams based on predefined conditions.
  3. Data Pre-Aggregation (Think continuously updating materialized views in RDBMSs or recording rules in prometheus): Running SQL-based aggregation queries and storing the results in new data streams.

These capabilities allow users to structure their logs effectively, minimize storage costs, and extract meaningful insights from raw data.

Sample Log Stream for Parsing with Pipelines

To demonstrate the power of OpenObserve Pipelines, let's take a sample raw log record before transformation:

{
  "timestamp": "01/Apr/2021:12:02:31 +0000",
  "message": "172.17.0.1 - alice [01/Apr/2021:12:02:31 +0000] \"POST /not-found HTTP/1.1\" 404 153 \"http://localhost/somewhere\" \"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36\" \"2.75\""
}

The Need for Transformation

Raw logs like this are difficult to analyze efficiently. Using Vector Remap Language (VRL), we can parse the message field and extract structured fields.

Expected Transformed Output

{
  "client": "172.17.0.1",
  "user": "alice",
  "timestamp": "2021-04-01T12:02:31Z",
  "request": "POST /not-found HTTP/1.1",
  "status": 404,
  "size": 153,
  "referer": "http://localhost/somewhere",
  "agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36",
  "compression": "2.75"
}

Creating a Pipeline in OpenObserve

A quick test to transform this can be done via the VRL function editor as shown in the below image.

Pipelines > Functions > Create new function

vrl_func_editor.png

Now to make this flow as a pipeline, follow the below steps.

Step 1: Adding a Data Source

To process Nginx logs, start by creating a Pipeline in OpenObserve:

  1. Navigate to the Pipelines section.
  2. Click Add Pipeline.
  3. Drag the Stream icon onto the canvas.
  4. Select the log stream (default_test).
  5. Confirm and proceed.

Step 2: Adding a Transform Function

Now, let’s transform the unstructured log using VRL. Drag the Function icon and add the following VRL function by clicking on Create new function

msg, err = parse_nginx_log(.message, "combined")
if err == null {
   . = msg
}

This function:

  • Parses the message field of the log.
  • If no error occurs, it replaces the raw log with the structured fields.

Step 3: Connecting the Components

  • Delete the default input-output link.
  • Drag the Destination stream component.
  • Connect the input stream to the transformation function.
  • Connect the transformation function to the output stream.
  • Save the pipeline and give it a meaningful name.

pipelines-2.png

Step 4: Verifying Transformed Logs

Once the pipeline is active, ingest some new Nginx logs and verify the structured output in OpenObserve’s log interface.

Dynamic Data Routing with Pipelines

In some cases, you might want to store logs with different HTTP status codes into separate data streams. For example:

  • Logs with status 200 should go to nginx_success.
  • Logs with status 404 should go to nginx_not_found.
  • Logs with status 500 should go to nginx_server_errors.

We can modify our Pipeline to achieve this dynamic routing:

  1. Add a Condition node to check msg.status.
  2. Use route as the dynamic stream name.
  3. Ensure a default data stream is set to avoid data loss.

pipelines_multi.png

Data Pre-Aggregation in OpenObserve

Rather than performing expensive queries on historical data, we can pre-aggregate statistics and store them in a separate data stream for faster retrieval.

Example Aggregation Query:

SELECT
  COUNT(*) AS total_requests,
  COUNT(CASE WHEN status = 200 THEN 1 END) AS success_count,
  COUNT(CASE WHEN status >= 400 AND status < 500 THEN 1 END) AS client_errors,
  COUNT(CASE WHEN status >= 500 THEN 1 END) AS server_errors
FROM "nginx_pipelines_demo"

Steps to Implement:

  1. Drag the Query icon into the Pipeline canvas.
  2. Enter the aggregation SQL.
  3. Set a refresh interval (e.g., every 5 minutes).
  4. Connect it to an output stream (nginx_aggregated).
  5. Save the pipeline.

pipeline_query_1.png pipeline_query_2.png pipeline_query_3.png

Benefits of Pre-Aggregation

  • Reduces query load for dashboards.
  • Optimizes performance for historical analysis.
  • Enables faster visualizations with precomputed metrics.

Conclusion

OpenObserve Pipelines offer a powerful, flexible, and efficient way to process logs. Whether you need real-time transformations, dynamic routing, or pre-aggregated insights, OpenObserve Pipelines enable you to handle log data at scale with minimal configuration.

Key Takeaways:

  • Parse raw logs into structured fields using VRL functions.
  • Route logs dynamically based on conditions.
  • Pre-aggregate logs to optimize performance.

If you haven’t explored Pipelines in OpenObserve yet, now is the perfect time to get started!

Get Started with OpenObserve Today!

Sign up for a free trial of OpenObserve on our website. Check out our GitHub repository for self-hosting and contribution opportunities.

About the Author

Chaitanya Sistla

Chaitanya Sistla

LinkedIn

Chaitanya Sistla is a Principal Solutions Architect with 16X certifications across Cloud, Data, DevOps, and Cybersecurity. Leveraging extensive startup experience and a focus on MLOps, Chaitanya excels at designing scalable, innovative solutions that drive operational excellence and business transformation.

Latest From Our Blogs

View all posts