Federated Search Architecture

This document explains the technical architecture of OpenObserve deployments, how queries execute in normal clusters, and how federated search coordinates queries across clusters in a supercluster.

Availability

This feature is available in Enterprise Edition. Not available in Open Source and Cloud.

Understanding OpenObserve deployments

Before diving into how federated search works, you need to understand how OpenObserve can be deployed. OpenObserve scales from a single machine to a globally distributed infrastructure.

Single node deployment

The simplest deployment: one instance of OpenObserve runs all functions on one machine. Data stores locally, and the node processes queries directly. This works for testing or small deployments.

Single cluster deployment

When you need scale, multiple specialized nodes work together as a cluster. Each node type has a specific role:

Router: Entry point that forwards queries to queriers
Querier: Processes queries in parallel with other queriers
Ingester: Receives and stores data in object storage
Compactor: Optimizes files and enforces retention
Alertmanager: Executes alerts and sends notifications

A single cluster handles more data and provides higher availability than a single node.

Supercluster deployment

When you need to operate across multiple geographical regions, multiple clusters connect as a supercluster. This is where federated search becomes relevant.

Key point

Each cluster in a supercluster operates independently with its own data storage. Data ingested into one cluster stays in that cluster. However, configuration metadata synchronizes across all clusters, allowing unified management.

Region and cluster hierarchy

In a supercluster, regions organize clusters geographically. A region may contain one or more clusters.
Example:

Region: us-test-3
  ├─ Cluster: dev3
  └─ Cluster: dev3-backup

Region: us-test-4
  └─ Cluster: dev4

Each cluster has independent data storage. Data stays where it was ingested.

How queries execute

Understanding query execution helps you understand how federated search works whether querying one cluster or multiple clusters.

Normal cluster query execution

This section explains how any OpenObserve cluster processes queries internally, regardless of whether it is a standalone cluster or part of a supercluster. Understanding this internal process is essential because:

This is how standalone clusters work
This is what happens when you query your current cluster in a supercluster without federated search coordination
During federated search, each individual cluster uses this same internal process to search its own data

When a cluster receives a query:

Router forwards the query to an available querier.
That querier becomes the leader querier.
Leader querier parses SQL, identifies data files, creates execution plan.
Leader querier distributes work among available queriers. These queriers become worker queriers.
All worker queriers search their assigned files in parallel.
Worker queriers send results to the leader querier.
Leader querier merges results and returns final answer.

Query execution for your current cluster in a supercluster

Your current cluster is the cluster you are logged into. When you select your current cluster from the Region dropdown, this is not federated search.
For example, if you are logged into Cluster A and you select Cluster A from the Region dropdown, the query executes using the normal cluster query execution process described above. No cross-cluster communication occurs, and no federated search coordination is needed.

Federated search for one different cluster in a supercluster

When you select a different cluster from the Region dropdown, not the cluster you are logged into, federated search coordination is used:

Step 1: Coordination setup
Your current cluster becomes the leader cluster.

Step 2: Query distribution
Leader cluster sends the query to the selected cluster via gRPC.

Step 3: Query processing
The selected cluster processes the query using its normal cluster query execution process.

Step 4: Result return
The selected cluster sends its results back to the leader cluster.

Step 5: Result presentation
The leader cluster displays the results.

Federated search for multiple clusters in a supercluster

When you select no cluster or multiple clusters from the Region dropdown, federated search extends the query across all selected clusters:

Step 1: Coordination setup
Your current cluster becomes the leader cluster. The leader cluster identifies all selected clusters, or all clusters if none selected, that contain data for the queried stream. These other clusters become worker clusters.

Step 2: Query distribution
The leader cluster sends the query to all worker clusters via gRPC. All clusters now have the same query to execute.

Step 3: Parallel processing
Each cluster processes the query using its normal cluster query execution process. The leader cluster searches its own data if it contains data for that stream. Worker clusters search their own data. All processing happens simultaneously.

Step 4: Result aggregation
Each cluster aggregates its own results internally using its leader querier and worker queriers. Worker clusters send their aggregated results to the leader cluster. The leader cluster merges all results from all clusters and returns the unified response.

Metadata synchronization

In a supercluster, clusters share configuration and schema information in real-time while keeping actual data separate. This synchronization happens via NATS, a messaging system that coordinates communication between clusters.
While stream schemas are synchronized across all clusters in real-time, the actual data for a stream only exists in the cluster or clusters where it was ingested.

Synchronized across clusters	NOT synchronized (stays local)
Schema definitions	Log data
User-defined functions	Metric data
Dashboards and folders	Trace data
Alerts and notifications	Raw ingested data
Scheduled tasks and reports	Parquet files and WAL files
User and organization settings	Search indices
System configurations
Job metadata
Enrichment metadata

This design maintains data residency compliance while enabling unified configuration management.

How nodes coordinate internally using NATS

OpenObserve uses NATS for internal coordination between nodes within a region. This coordination enables nodes to share information for purposes such as caching and maintaining cluster awareness.
As part of the reliability improvement in inter-node communication, OpenObserve now uses NATS stream queues to broadcast NATS events instead of using NATS key-value watchers. The NATS stream queue ensures reliable delivery by retrying event transmission until all subscribers receive the event for processing.
Except for the nodes list, nothing is now stored in NATS key-value storage.

Limitations

No cluster identification in results: Query results do not indicate which cluster provided specific data. To identify the source, query each cluster individually.