Pipelines in OpenObserve
This page provides an overview of pipelines, their types, and how they work.
What are Pipelines?
Pipelines enable seamless data ingestion and transformation using an intuitive drag-and-drop interface.

Types of Pipelines
OpenObserve supports two types of pipelines to cater to different data processing needs:
- Real-time pipelines
- Scheduled pipelines
Real-Time Pipelines
A real-time Pipeline processes incoming raw data instantly, enabling immediate transformations and routing without delays.
How they work
- Source: As soon as raw data is ingested into the source stream, the pipeline begins processing it.
- The supported source stream types are Logs, Metrics, or Traces.
Note: Each source stream can be associated with only one real-time pipeline.
- The supported source stream types are Logs, Metrics, or Traces.
- Transform: The pipeline applies conditions or functions to filter and transform the data in real-time.

- Destination: The transformed data is sent to the following destination(s) for further use or storage:
- Stream: The supported destination stream types are Logs, Metrics, Traces, or Enrichment tables.
Note: Enrichment Tables can only be used as destination streams in scheduled pipelines.
- Remote: Select Remote if you wish to send data from the pipeline to external destinations.
- Stream: The supported destination stream types are Logs, Metrics, Traces, or Enrichment tables.
When to use
Use real-time pipelines when you need immediate processing, such as monitoring live data and cleaning logs in real-time.
Scheduled Pipelines
A scheduled pipeline automates the processing of historical data from an existing stream at user-defined intervals. This is useful when you need to extract, transform, and load (ETL) data at regular intervals without manual intervention.

Performance
OpenObserve maintains a cache for scheduled pipelines to prevent the alert manager from making unnecessary database calls. This cache becomes particularly beneficial when the number of scheduled pipelines is high. For example, with 500 scheduled pipelines, the cache eliminates 500 separate database queries each time the pipelines are triggered, significantly improving performance.
How they work
- Source: To create a scheduled pipeline, you need an existing stream, which serves as the source stream.
- The supported source stream types are Logs, Metrics, or Traces.
- Query: You write a SQL query to fetch historical data from the source stream. When defining the query, you also set the Frequency and Period to control the scheduling.

- Transform: The fetched data is transformed by applying functions or conditions.

- Destination: The transformed data is sent to the following destination(s) for storage or further processing:
- Stream: The supported destination stream types are Logs, Metrics, Traces, or Enrichment tables.
Note: Enrichment Tables can only be used as destination streams in scheduled pipelines. - Remote: Select Remote if you wish to send data to external destination.
- Stream: The supported destination stream types are Logs, Metrics, Traces, or Enrichment tables.
Frequency and Period
The scheduled pipeline runs based on the user-defined Frequency and Period.
- Frequency: Defines how often the query should be executed.
Example: Frequency: 5 minutes
It ensures the query runs every 5 minutes. - Period: Defines the period for which the query fetches the data.
Note: The period should be the same as frequency, and both must be greater than 4.
Example: Period: 5 minutes
It ensures the query fetches the data that was ingested in the last 5 minutes. - Frequency with Cron: Cron allows you to define custom execution schedules based on specific expressions and timezones. It is ideal for scenarios requiring tasks to run at predefined times.
Example: Cron Expression:0 0 1 * *
Timezone: Asia/Kolkata
It ensures the query runs at 12:00 AM IST (00:00 in 24-hour format) on the first day of the month.
When to use
Use scheduled pipelines for tasks that require processing at fixed intervals instead of continuously, such as generating periodic reports and processing historical data in batches.