Sensitive Data Redaction in OpenObserve: How to Redact, Hash, and Drop PII Data Effectively

Manas Sharma
Manas Sharma
November 07, 2025
8 min read
Don’t forget to share!
TwitterLinkedInFacebook

Stay Updated

Get the latest OpenObserve insights delivered to your inbox

By subscribing, you agree to receive product and marketing related updates from OpenObserve.

Table of Contents
SDR blog.png

Introduction

As organizations scale their observability systems, the volume of logs, metrics, and traces captured across services continues to grow exponentially. These records often contain more than just operational data, they may inadvertently include sensitive information such as user emails, IP addresses, access tokens, or payment identifiers.

In regulated industries, this creates an immediate compliance challenge. Frameworks like GDPR, HIPAA, and SOC 2 require organizations to ensure that sensitive data is protected, stored responsibly, and never exposed unintentionally.

Traditional observability stacks rely on developers to filter such data within their applications before it’s ingested. But with distributed systems, this approach doesn’t scale as it’s hard to predict where sensitive values may appear.

To address this, OpenObserve Enterprise edition introduces Sensitive Data Redaction (SDR) — a native feature that identifies and protects sensitive data automatically at ingestion or query time. It enables teams to maintain complete visibility while staying compliant and secure.

Why Sensitive Data Appears in Observability Pipelines

Every request, event, or trace emitted by your systems can potentially contain personally identifiable information (PII) or secrets. This can happen unintentionally, for example:

  • Logs that capture entire HTTP payloads.
  • Application errors printing stack traces with user data.
  • Debug traces including session tokens or IPs.

Once such data lands in your observability store, it becomes searchable, visualized on dashboards, and sometimes shared across teams amplifying privacy risks.

Manually cleaning up or rewriting this data later is difficult. External scrubbing tools often introduce latency, require additional infrastructure, and may miss dynamically generated fields.

OpenObserve Enterprise eliminates this problem by providing built-in redaction capabilities that operate directly in the data pipeline, ensuring protection from the moment data arrives or is queried.

Understanding Sensitive Data Redaction in OpenObserve

Sensitive Data Redaction (SDR) in OpenObserve works by inspecting data for patterns that match defined regular expressions (regex). Once matched, the system applies a user-defined action: redact, hash, or drop.

Each of these actions has a distinct purpose:

  1. Redact: Replace the matched content with a placeholder like [REDACTED].

  2. Hash: Sensitive data is read from storage and replaced with a deterministic MD5 hash, such as [REDACTED:12314HASH]. This ensures consistency between records while keeping data anonymized for display or correlation.

  3. Drop: Remove the sensitive field or value entirely.

Type of actions on PII data

Redaction rules can run at two different stages:

  • Ingestion time – data is processed before being stored.
  • Query time – data is masked or excluded when fetched.

The SDR engine is powered by Intel Hyperscan, a high-performance regex engine optimized for multi-threaded workloads. This ensures minimal latency and near real-time throughput, even at enterprise-scale ingestion rates.

Ingestion-Time vs Query-Time Protection

Redaction can happen either before data is written (ingestion-time) or when data is read (query-time). Each mode serves a different operational goal.

Ingestion-Time Protection

When applied at ingestion, sensitive values are redacted, hashed, or dropped before being persisted. This means:

  • The original data never enters the storage layer.
  • It cannot be recovered later — ensuring full compliance with data minimization standards.
  • Ideal for industries bound by GDPR, PCI DSS, or HIPAA requirements.

For example, an email like jane.doe@example.com becomes [REDACTED] before reaching storage. Even administrators with direct access to the data cannot retrieve the original value.

Query-Time Protection

Query-time protection operates differently. Sensitive data is still stored but masked dynamically when queried.

  • This allows retaining raw data for forensic or audit purposes.
  • It’s useful when you need flexibility for retrospective analysis but want to ensure sensitive fields are hidden from dashboards or reports.

For instance, a credit card number may remain stored internally but displayed as [REDACTED] when accessed through the query interface.

Choosing the Right Mode

  • Ingestion-time redaction → irreversible protection, best for compliance and minimal data exposure.
  • Query-time redaction → flexible access control, suitable for internal security teams or auditing workflows.
  • Many organizations use both modes together — ingestion for critical PII fields and query-time for operational metadata.

The Three Actions: Redact, Hash, and Drop

Each redaction action serves a unique role in managing data exposure risk.

Redact

Replaces sensitive content with a placeholder (e.g., [REDACTED]) while preserving context. This is useful when you want to retain log readability — such as debugging messages — without storing private values.

Example:

Before: "User john@example.com logged in"
After:  "User [REDACTED] logged in"

Hash

The matched value is replaced with a deterministic MD5 hash, for example [REDACTED:12314HASH].
This makes the value unreadable but still searchable using its hashed equivalent. It allows security and operations teams to trace repeated occurrences without accessing the original sensitive information.

Example:

Before: "Credit Card:4111-1111-1111-1111"
After:  "Credit Card:[REDACTED:907fe4882def....]"

Drop

Completely removes the field or value before it’s stored or displayed. Use this when you want to ensure certain data (like passwords or tokens) never persists in your observability store.

Configuration Guide: Defining and Applying Regex Patterns

OpenObserve’s redaction system is fully configurable through its management UI, refer to below steps:

  • Navigate to Management → Sensitive Data Redaction.

SDR tab in Management Settings

  • Click Create Pattern to define a new regex rule.

  • Enter a regex expression that identifies the target data (e.g., email addresses, IPs).

Test Regex with Input String

  • Test your regex directly in the interface to ensure accurate matches.

[REDACTED] output based on regex pattern

  • Apply the pattern to a stream field. (Only UTF8 field types are supported.)

Apply Pattern to UTF-8 type Field

  • Select an Action — Redact, Hash, or Drop.

Select an Action in Stream Details

  • Choose When to Apply — Ingestion, Query, or Both.

Apply action at Ingestion or Query or Both

  • Add pattern and click Update Changes to save the configuration

You can attach multiple patterns to the same stream. This makes it easy to maintain different policies for various data types (e.g., emails, tokens, IPs).

Role-Based Access Control (RBAC)

Sensitive data management must be restricted to authorized users. OpenObserve Enterprise integrates SDR directly with its IAM and RBAC system to maintain security boundaries.

  • Regexp Patterns module: controls who can create, modify, or delete regex patterns.

Permissions module for Regexp Patterns

  • Streams module: controls who can attach or detach redaction rules from data streams.
  • Root users always retain full access.

This ensures that pattern creation and rule enforcement remain separated — a key principle for compliance audits and internal governance.

Searching Hashed Data with match_all_hash()

When using the Hash action, OpenObserve allows you to search for hashed values using the original string — without revealing it.

The function match_all_hash() converts your query term into its hash internally and matches it against stored values.

Example:

match_all_hash('4111-1111-1111-1111')

Searching Hashed Data with match_all_hash()

This will return all records that contain the hashed form of that card number, even though the actual value is not stored.

Note: match_all_hash() only works on fields where full-text search has been enabled. It is recommended to turn on full-text search for any field that uses Sensitive Data Redaction with hashing.

This ensures that searches for sensitive values (via their hashed equivalent) are fast and accurate.

If full-text search is not enabled for a field, you can still search using the hash directly. To generate the hash, calculate the MD5 hash of your input value using any tool.

For example, on a terminal you can run:

echo -n "openobserve" | md5

Then search using a LIKE '%HASH%' query to match the hashed token.

Performance and Limitations

Sensitive Data Redaction is built for production-scale observability environments. Its Intel Hyperscan integration allows parallel regex evaluation across streams, minimizing ingestion latency.

However, there are a few practical notes:

  • Regex-based rules apply only to UTF8 fields.
  • Streams must already contain data before fields can be configured for redaction.
  • Complex regex patterns may marginally impact ingestion throughput.
  • SDR operates independently from query filters — ensuring protection even if queries bypass UI layers.

Compared to transformation pipelines or external filters, SDR delivers higher performance with less operational complexity.

Best Practices

  • Start by auditing what qualifies as sensitive in your environment — credentials, emails, tokens, or IDs.
  • Test regex expressions on a sample dataset before applying them globally.
  • Use ingestion-time redaction for strict compliance fields.
  • Apply query-time protection for operational flexibility.
  • Keep regex expressions simple — complex patterns can slow evaluation.
  • Review and update your regex library regularly as data formats evolve.

Conclusion

Sensitive Data Redaction in OpenObserve Enterprise helps teams balance visibility with security. It provides a native, automated way to identify and protect sensitive information — without additional tools or manual intervention.

With flexible redaction modes, role-based access, and high-performance regex matching, OpenObserve gives organizations the confidence to scale observability safely and compliantly.

This feature is available in OpenObserve Cloud and Enterprise Self-Hosted Edition only. It is not part of the Open Source edition.

If you’re using the open-source version, check out this guide: ➡ How to Redact Sensitive PII Data Using VRL Transformations

References

Try OpenObserve Cloud free for 14 days or deploy the Enterprise Edition self-hosted. Enterprise is free up to 200 GB/day ingestion, and includes additional features like enhanced performance, advanced user management, and built-in security capabilities.

About the Author

Manas Sharma

Manas Sharma

TwitterLinkedIn

Manas is a passionate Dev and Cloud Advocate with a strong focus on cloud-native technologies, including observability, cloud, kubernetes, and opensource. building bridges between tech and community.

Latest From Our Blogs

View all posts