Tantivy Index
This document explains Tantivy indexing in OpenObserve, the types of indexes it builds, how to use the correct query patterns, and how to verify and configure indexing.
Tantivy indexing is an open-source feature in OpenObserve.
What is Tantivy?
Tantivy is the inverted index library used in OpenObserve to accelerate searches. An inverted index keeps a map of values or tokens and the row IDs of the records that contain them. When a user searches for a value, the query can use this index to go directly to the matching rows instead of scanning every log record.
Tantivy builds two kinds of indexes in OpenObserve:
Full-text index
For fields such as body
or message
that contain sentences or long text. The field is split into tokens, and each token is mapped to the records that contain it.
Example log records
- Row 1:
body = "POST /api/metrics error"
- Row 2:
body = "GET /health ok"
- Row 3:
body = "error connecting to database"
The log body POST /api/metrics error
is stored as tokens POST
, api
, metrics
, error
. A search for error
looks up that token in the index and immediately finds the matching records.
Secondary index
For fields that represent a single exact value. For example, k8s_namespace_name
. In this case, the entire field value is treated as one token and indexed.
Example log records
- Row 1:
k8s_namespace_name = ingress-nginx
- Row 2:
k8s_namespace_name = ziox
- Row 3:
k8s_namespace_name = ingress-nginx
- Row 4:
k8s_namespace_name = cert-manager
For k8s_namespace_name
, the index might look like:
ingress-nginx
> [Row 1, Row 3]ziox
> [Row 2]cert-manager
> [Row 4]
A query for k8s_namespace_name = 'ingress-nginx'
retrieves those rows directly, without scanning unrelated records. By keeping these indexes, Tantivy avoids full scans across millions or billions of records. This results in queries that return in milliseconds rather than seconds.
Configure Environment Variable
To enable Tantivy indexing, configure the following environment variable:
Query behavior
Tantivy optimizes queries differently based on whether the field is full-text or secondary. Using the right operator for each field type ensures the query is served from the index instead of scanning logs.
Full-text index scenarios
Correct usage
- Use
match_all()
for full-text index fields such asbody
ormessage
: - Use
NOT
withmatch_all()
:
Inefficient usage
Secondary index scenarios
Correct usage
- Use
=
orIN (...)
for secondary index fields such ask8s_namespace_name
,k8s_pod_name
, ork8s_container_name
. - Use NOT with
=
orIN (...)
Inefficient usage
Mixed scenarios
When a query combines full-text and secondary fields, apply the best operator for each part.
Correct usage
match_all('error')
uses full-text index.k8s_namespace_name = 'ingress-nginx'
uses secondary index.
Incorrect usage
AND and OR operator behavior
AND behavior
- If both sides are indexable, Tantivy intersects the row sets from each index.
- If one side is not indexable, the indexable side is still accelerated by Tantivy, and the other side is resolved in DataFusion.
Examples
-- Fast: both sides indexable
WHERE match_all('error') AND k8s_namespace_name = 'ingress-nginx';
-- Mixed: one side indexable, one not
WHERE match_all('error') AND body LIKE '%error%';
OR behavior
- If all branches of the OR are indexable, Tantivy unites the row sets efficiently.
- If any branch is not indexable, the entire OR is not indexable. The query runs in DataFusion.
Examples
-- Fast: both indexable
WHERE match_all('error') OR k8s_namespace_name = 'ziox';
-- Slower: both sides are not indexable
WHERE match_all('error') OR body LIKE '%error%';
NOT with grouped conditions
-- Exclude when either namespace = ziox OR body contains error
WHERE NOT (k8s_namespace_name = 'ziox' OR match_all('error'));
Verify if a query is using Tantivy
To confirm whether a query used the Tantivy inverted index:
- Open the browser developer tools and go to the Network tab.
- Inspect the query response JSON.
-
Under took_detail, check the value of
idx_took
:- If
idx_took
is greater than0
, the query used the inverted index. - If
idx_took
is0
, the query did not use the inverted index.
- If