How Jidu Achieved 100% Tracing Fidelity and Optimized Storage by 70% with OpenObserve

Jidu, the technology company behind Jiyue’s autonomous driving systems, faced growing challenges in maintaining observability across their distributed systems. As telemetry data from their vehicles increased exponentially, their existing Elasticsearch-based observability stack was no longer sufficient.
The team identified three critical issues:
1. Incomplete Observability
Elasticsearch’s scalability limitations forced Jidu to sample only 10% of traces, leaving 90% of application behavior hidden. This lack of visibility made it difficult to identify performance bottlenecks and diagnose issues effectively.
2. Performance Bottlenecks
Despite allocating 24 CPU cores, 96GB of RAM, and 12TB of storage, Elasticsearch struggled with long-term queries and statistical analysis. Engineers frequently encountered memory errors and timeouts, slowing down debugging efforts.
3. High Operational Costs
Storing 1TB/day of sampled data resulted in significant infrastructure costs. Scaling to handle full-fidelity tracing would have been prohibitively expensive.
For Jidu, these challenges weren’t just technical—they directly impacted the stability and reliability of their autonomous driving systems, posing risks to customer satisfaction and operational efficiency.
The Solution: Why Jidu Chose OpenObserve
To address these challenges, Jidu migrated to OpenObserve, an open-source observability platform designed for high-performance monitoring at scale. OpenObserve offered a unified solution for logs, metrics, distributed tracing, and front-end monitoring—all while significantly reducing resource consumption and costs.
Key Benefits of OpenObserve for Jidu
- Full-Fidelity Tracing: OpenObserve enabled the collection of 100% of traces, providing complete visibility into application behavior without sampling.
- Efficient Resource Usage: OpenObserve’s architecture reduced CPU usage to less than 1 core per node and kept memory usage below 6GB—even during peak loads.
- Cost Savings Through Compression: Daily storage requirements dropped from 1TB to 0.3TB—despite a 10x increase in ingested data—thanks to advanced data compression techniques.
- Real-Time Insights: Engineers could now perform long-term queries and statistical analysis without timeouts or memory errors.
Jidu’s Observability Transformation with O2
Jidu’s transition to OpenObserve (O2) marked a significant shift in how the company managed observability for its autonomous driving systems. By addressing long-standing challenges and introducing innovative solutions, O2 transformed Jidu’s operations, making their systems more efficient, reliable, and scalable. Below is a detailed breakdown of the transformation.
Challenges Before O2 Adoption
Before adopting OpenObserve, Jidu faced several critical challenges that hindered their ability to maintain complete observability and optimize performance:
Challenge | Details |
---|---|
Trace Metric Statistics on VMs | Jidu relied on virtual machines (VMs) for trace metric statistics. However, shared resources meant that a 2GB memory agent couldn’t handle the collection of hundreds of thousands of metrics dynamically generated by spans. |
High Query Concurrency | Automated scripts queried data every five minutes to monitor real-time errors in links, creating high query concurrency. This caused frequent timeouts for ordinary users’ queries, slowing down debugging and analysis efforts. |
Limited Trace Querying | Developers could only query trace IDs within a specific time range. Without knowing the start and end times of traces, pinpointing issues was cumbersome and often required guesswork. |
Fragmented Debugging Workflows | Debugging required jumping between trace detail pages and business logs manually. This disrupted workflows, slowed collaboration between teams, and made it harder to resolve issues quickly. |
Lack of Metric Correlation | Engineers couldn’t correlate trace spans with container or node resource metrics directly from the trace detail page, making alarm responses inefficient and time-consuming. |
Limited Message Queue Insights | Trace links lacked detailed message queue information, making it difficult to analyze message content during debugging. |
These limitations increased operational complexity and posed risks to system stability and customer satisfaction.
Improvements After O2 Implementation
The adoption of OpenObserve introduced transformative changes across multiple dimensions of Jidu’s observability strategy. Here’s how O2 addressed each challenge:
1. Automated Trace Metric Statistics
With OpenObserve, metrics were automatically written in real-time during each operation. This eliminated the need for manual collection on VMs and ensured that all trace metrics were captured efficiently without resource contention.
2. Resource Grouping for Queries
O2 introduced a grouping feature that isolated resources between UI queries and automated tasks. This separation ensured that automated queries—such as those used for monitoring real-time errors—no longer affected ordinary users’ UI queries or caused timeouts.
3. Enhanced Trace Querying with Proxy Services
To streamline trace querying, Jidu implemented an external proxy service that recorded start and end times for trace IDs. Before querying O2, the proxy service retrieved these timestamps from the trace ID index service, enabling precise and efficient queries without relying on guesswork.
4. In-Line Log Display
Logs were integrated directly into the O2 trace detail page for each service span. Engineers could now view logs alongside trace details without navigating away from the interface, significantly improving debugging workflows.
5. Metric Correlation on Trace Pages
By adopting OTEL standards for collecting container IPs and node IPs in trace spans, Jidu enabled direct correlation between traces and resource metrics. Clicking on container or node tags within a trace span now displayed relevant resource usage metrics—such as CPU or memory usage—allowing engineers to respond to alarms faster.
6. Message Queue Details Integration
O2 enhanced message queue visibility by including message queue IDs and cluster names in trace spans. Clicking on these fields displayed detailed message content, enabling teams to analyze message queues quickly during debugging sessions.
The following table highlights the improvements delivered by OpenObserve across key areas:
Challenge | Before O2 Transformation | After O2 Transformation |
---|---|---|
Trace Metric Statistics | Manual collection on VMs; frequent failures due to resource constraints | Automated real-time metric collection with no resource contention |
Query Concurrency Management | High concurrency caused timeouts for ordinary users | Resource grouping ensured automated queries didn’t affect UI queries |
Trace Querying Efficiency | Limited to querying trace IDs within guessed time ranges | Proxy service enabled precise querying using start and end times |
Debugging Workflow | Fragmented; required jumping between tools to view logs | Logs integrated directly into O2’s trace detail page |
Metric Correlation with Traces | No direct correlation between traces and container/node metrics | Clicking on tags within traces displayed relevant resource metrics |
Message Queue Insights | Lacked visibility into message queue details | Message queue fields allowed quick access to detailed message content |
OpenObserve Implementation Highlights
Jidu’s migration to OpenObserve was executed in three well-planned phases:
1. Deployment on Kubernetes
OpenObserve was deployed in shared Kubernetes clusters with a high-availability (HA) configuration to ensure reliability and scalability across Jidu’s distributed systems.
2. Data Migration
Existing telemetry data was migrated seamlessly using OpenObserve’s Elasticsearch-compatible APIs, ensuring continuity without data loss or downtime.
3. Pipeline Optimization
Data ingestion pipelines were reconfigured to handle higher throughput while leveraging OpenObserve’s efficient compression and indexing capabilities.
The Results
Switching to OpenObserve delivered transformative results for Jidu’s engineering team and overall operations:
Metric | Before OpenObserve | After OpenObserve | Improvement |
---|---|---|---|
Trace Coverage | 10% | 100% | 10x |
Storage Requirements | 1TB/day | 0.3TB/day | ~70% reduction |
Query Response Time | Frequent timeouts | Sub-second | Resolved |
Debugging Time | Hours per issue | Minutes per issue | ~8x faster |
Key Business Outcomes
- 100% Trace Fidelity: Full visibility into application behavior allowed engineers to proactively identify and resolve potential issues before they impacted customers.
- 70% Lower Storage Requirements: Significant reductions in data storage freed up resources for other critical projects.
- Improved Engineering Efficiency: Faster debugging times enabled engineers to focus on innovation rather than troubleshooting.
- Enhanced Customer Experience: Improved application stability while increasing reliability and satisfaction among end-users.
What Other Companies Can Learn from Jidu's Transformation
Jidu’s success story offers valuable lessons for companies managing high-throughput telemetry data in mission-critical environments like autonomous vehicles or IoT:
- Full Observability Is Affordable: Modern platforms like OpenObserve make it possible to capture all your trace data without breaking the bank.
- Efficient Resource Usage Saves Costs: Advanced compression techniques drastically reduce storage requirements while improving system performance.
- Unified Platforms Simplify Operations: Combining logs, metrics, traces, and front-end monitoring into one platform reduces complexity and improves workflows.
What Jidu Says About OpenObserve
OpenObserve gave us complete visibility into our systems while cutting our costs by more than half—it’s a no-brainer for any company with growing observability needs.
— Zhao Wei, APM Expert at Jidu
Ready to Transform Your Observability?
Whether you’re managing autonomous systems, IoT devices, or cloud-native applications, OpenObserve can help you achieve full-fidelity tracing without the high costs of legacy solutions. Here’s how you can get started:
- Calculate Your Potential Savings → Contact us for a custom quote and an assessment of your potential cost reductions.
- Learn More About Our Platform → Explore how OpenObserve works for high-throughput environments.
- Request a Demo or Try OpenObserve→ See the platform in action with your own data.
Table of Contents
Openobserve Cloud Free Tier
Monthly Limits:
Ingestion - 50 GB logs, 50 GB metrics , 50 GB traces
Query volume - 200 GB
Pipelines - 50 GB of Data Processing
1K RUM & Session Replay
1K Action Script Runs
3 Users
7-Days Retention
Get started in minutes—no credit card required.
Solutions
Company
Resources
Pricing
OpenObserve Inc. © 2025
3000 Sand Hill Rd Building 1, Suite 260, Menlo Park, CA 94025