A Comprehensive Guide to Distributed Tracing: From Basics to Beyond

Nitya Timalsina

November 14, 2024

15 min read

Don’t forget to share!

Table of Contents

It's Cyber Monday, your biggest sales day of the year, and your e-commerce platform's checkout system grinds to a screeching halt. Orders are failing, customers are abandoning their carts, and your team is drowning in endless logs from dozens of microservices. The root cause could be anywhere: Is it the payment processor? The inventory system? The user authentication service? The clock is ticking, and every passing second costs you thousands in lost revenue. Finding and fixing issues is pretty impossible without any way to track order requests across all of your interconnected services.

Enter distributed tracing to the rescue. Unlike traditional monitoring solutions that provide isolated views of individual services, distributed tracing reveals the complete journey of every request through your system. It’s like having a map showing exactly where each request goes and where it gets stuck.

In this guide, we’ll build a real distributed system with two microservices—an order service and an inventory service—to demonstrate how distributed tracing works in practice.

Let’s dive in!

What is Distributed Tracing?

Think of distributed tracing like a GPS system but for your data. Just like how a GPS tracks your car's journey from start to finish—including every turn, stop, and delay—distributed tracing tracks a request's journey through your application's various services.

For example, when a customer places an order on an e-commerce site, their request might:

Start at the web interface
Move to the authentication service to verify their identity
Pass through the inventory service to check item availability
Hit the payment service to process their credit card
Finally reach the order service to confirm their purchase

In a modern application, a single user request might touch dozens of different services. When something goes wrong—like a slow checkout process or a failed payment—distributed tracing helps you:

Pinpoint precisely where the problem occurred
Understand which services are affected
See how one service's issues impact others
Identify performance bottlenecks
Track the flow of requests through your system

This detailed visibility is crucial for maintaining healthy, high-performing applications.

Key Components of Distributed Tracing

Imagine a customer placing an order on an e-commerce site. Distributed tracing helps you see every step of this order’s journey, from the moment they click "buy" to the final confirmation. Here’s how the key parts of distributed tracing work together:

Spans: Think of spans as snapshots of each step, like “validate payment” or “check inventory.” Each span shows when a step started, when it ended, and if anything went wrong.
Traces: A trace is the full story of the order, made up of multiple spans. It follows the order’s path—from “buy” to payment, to inventory checks, and finally to confirmation. Each trace has a unique ID that ties it all together.
Context: Context is kind of like a tracking number for the order. It holds important information that links each span to the same journey, ensuring nothing gets lost along the way.
Context Propagation: Context propagation is what keeps everything connected. It passes the order’s “tracking info” from one service to the next—from the e-commerce site to payment and inventory—so each step is clearly linked.

Distributed tracing gives you the power to understand and troubleshoot the customer experience at every stage, keeping your systems smooth and your users happy.

Key components of distributed tracing in an e-commerce example illustrated in a sequence diagram

This sequence diagram shows how these pieces work together to enhance visibility into the entire order journey. With distributed tracing, you gain a clear, step-by-step view of your entire system, making it easier to spot and fix issues quickly.

Challenges of Modern Architecture

Modern applications are no longer monolithic. They have evolved into intricate ecosystems of microservices, each playing a unique role in delivering seamless user experiences. While this architecture offers flexibility and scalability, it also introduces complexity. A single user request might touch dozens of services before completing.

According to the 2023 Cloud Native Computing Foundation (CNCF) Annual Survey, 84% of organizations are either using or evaluating Kubernetes, with 66% running it in production. This widespread adoption of containerization and microservices creates increasingly complex systems where a single request can interact with dozens or even hundreds of interconnected services.

Traditional monitoring tools like logs and metrics certainly offer valuable insights but often fail to capture the relationships between different services:

Logs: Offer detailed information about individual components but lack context on how these components interact.
Metrics: Provide a high-level overview of system health but don’t reveal the specific paths taken by each request.

Traditional observability with two services, Service A and Service B, connected by a single request path, monitored by logs and metrics only

Distributed tracing fills this gap by stitching together the entire journey of a request, from its inception to completion, to give you a complete picture of your system’s behavior.

Modern observability with distributed tracing, showing three services (A, B, and C) with a continuous request path monitored by logs, metrics, and traces

Why Distributed Tracing Matters

For developers, SREs, and DevOps engineers managing complex systems, choosing a monitoring solution with robust distributed tracing capabilities offers several key advantages:

A. Enhanced Visibility

As illustrated above, distributed tracing offers an end-to-end view of how requests flow through your system.

B. Faster Troubleshooting

When issues arise—such as slow response times or failed transactions—how do you know which part of your system is acting out? Distributed tracing allows your teams to quickly pinpoint the root cause by simply following the affected request's path.

C. Improved Collaboration

Distributed tracing provides a shared understanding of system behavior across teams—whether it's development, operations, or product management. This shared visibility promotes better collaboration when diagnosing issues or planning optimizations.

D. Increased Efficiency

By identifying performance bottlenecks and inefficiencies in your system’s architecture (e.g., slow database queries or unnecessary API calls), distributed tracing helps optimize resource utilization and reduce operational costs. This is incredibly impactful in dynamic cloud environments where every millisecond counts.

Distributed Tracing Standards: OpenTelemetry & Other Frameworks

Not long ago, distributed tracing was provided by various vendors through proprietary structures that locked users into their ecosystems. OpenTelemetry, a unified standard backed by the Cloud Native Computing Foundation (CNCF), changed this by offering a vendor-neutral SDK that provides seamless access to tracing data without vendor lock-in.

While several distributed tracing frameworks are available today, OpenTelemetry has emerged as the industry standard because it provides a unified framework for collecting traces, metrics, and logs from applications running across different environments (cloud-native or on-premises). It also integrates seamlessly with popular observability platforms like OpenObserve, Datadog, Dynatrace, and Jaeger.

Other Frameworks

Jaeger: Originally developed by Uber for monitoring microservices-based architectures, Jaeger focuses specifically on distributed tracing but can be used as an OpenTelemetry backend for storing trace data.
Zipkin: Another popular open-source tool for distributed tracing that integrates well with OpenTelemetry but lacks some advanced features like metrics collection.

You can use these tools interchangeably depending on your needs; however, OpenTelemetry not only supports tracing but also integrates seamlessly with metrics and logs collection—making it a one-stop solution for modern observability.

Diagram of OpenTelemetry Collector receiving data from servers, processing traces, logs, and metrics, and exporting to a database and OpenObserve

Practical Example: Implementing Distributed Tracing with OpenObserve

Now that we’ve discussed distributed tracing and its importance in modern observability, let's actually implement it by building a simple FastAPI application consisting of two microservices: an order service and an inventory service.

We'll track how an order flows through both services using OpenTelemetry and visualize the traces in OpenObserve.

Step 0: Set up OpenObserve

Start OpenObserve using Docker:

docker run -d \
    --name openobserve \
    -p 5080:5080 \
    -e ZO_ROOT_USER_EMAIL=root@example.com \
    -e ZO_ROOT_USER_PASSWORD=Complexpass#123 \
    public.ecr.aws/zinclabs/openobserve:latest

Verify OpenObserve is running:

curl http://localhost:5080/healthz  
# Should return {"status":"ok"}

Find and note your authentication token. Here’s how you can find the token in the OpenObserve UI:

Fetching the authentication token within OpenObserve

Step 1: Prepare project environment

Create a project directory:

mkdir distributed-tracing-demo && cd distributed-tracing-demo

Create directories for both services:

mkdir -p {order-service,inventory-service}

Create Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:

pip install fastapi==0.104.1 \
    uvicorn==0.24.0 \
    opentelemetry-api==1.21.0 \
    opentelemetry-sdk==1.21.0 \
    opentelemetry-instrumentation-fastapi==0.42b0 \
    opentelemetry-exporter-otlp-proto-http==1.21.0 \
    opentelemetry-instrumentation-requests==0.42b0 \
    requests==2.31.0 \
    urllib3==1.26.18

Step 2: Create Inventory Service

# Navigate to inventory service directory
cd inventory-service

# Create main.py file for Inventory Service
cat > main.py << 'EOF'
from fastapi import FastAPI, Request
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from pydantic import BaseModel
import random

app = FastAPI()
class InventoryResponse(BaseModel):
    """Response model for inventory check endpoint."""
    order_id: str
    in_stock: bool  # Indicates if item is available in inventory

# Initialize OpenTelemetry tracer with inventory service name
provider = TracerProvider(
    resource=Resource.create({"service.name": "inventory-service"})
)
trace.set_tracer_provider(provider)

# Configure trace export to OpenObserve
provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(
        endpoint="http://localhost:5080/api/default/v1/traces",
        headers={"Authorization": "Basic <AUTH_TOKEN>"}
    )
))

# Initialize tracer and context propagator for distributed tracing
tracer = trace.get_tracer(__name__)
trace_propagator = TraceContextTextMapPropagator()

@app.get("/inventory/check/{order_id}")
async def check_inventory(order_id: str, request: Request) -> InventoryResponse:
    """Check inventory availability for a given order.
    
    Extracts trace context from the incoming request to maintain
    the distributed trace chain from the order service.
    """
    print(f"[DEBUG] Received headers: {dict(request.headers)}")
    
    # Extract trace context from incoming request
    context = trace_propagator.extract(request.headers)
    
    # Create span with extracted context to link with parent trace
    with tracer.start_as_current_span(
        "check_inventory",
        context=context,
        kind=trace.SpanKind.SERVER,
        attributes={
            "service.name": "inventory-service",
            "order.id": order_id
        }
    ) as span:
        try:
            # Simulate inventory check with random availability
            in_stock = random.choice([True, True, False])
            span.set_attribute("inventory.in_stock", in_stock)
            response = InventoryResponse(order_id=order_id, in_stock=in_stock)
            print(f"[DEBUG] Sending response: {response.dict()}")
            return response
        except Exception as e:
            print(f"[ERROR] {type(e).__name__}: {str(e)}")
            span.set_attribute("error", True)
            raise

@app.get("/health")
async def health_check():
    """Simple health check endpoint."""
    return {"status": "healthy"}
EOF

Note: Replace <AUTH_TOKEN> with your own authentication token.

Step 3: Create Order Service

# Navigate to order service directory
cd ../order-service

# Create main.py file for Order Service
cat > main.py << 'EOF'
from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from pydantic import BaseModel
import httpx

app = FastAPI()
class OrderResponse(BaseModel):
    """Response model for order creation endpoint."""
    status: str  # success, out_of_stock, or error
    order_id: str

# Initialize OpenTelemetry tracer with order service name
provider = TracerProvider(
    resource=Resource.create({"service.name": "order-service"})
)
trace.set_tracer_provider(provider)

# Configure trace export to OpenObserve
provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(
        endpoint="http://localhost:5080/api/default/v1/traces",
        headers={"Authorization": "Basic <AUTH_TOKEN>"}
    )
))

# Initialize tracer and context propagator for distributed tracing
tracer = trace.get_tracer(__name__)
trace_propagator = TraceContextTextMapPropagator()

@app.post("/orders/create")
async def create_order(order_id: str) -> OrderResponse:
    """Create a new order and check inventory availability.
    
    Creates a parent span for the order creation and a child span
    for the inventory service call to enable distributed tracing.
    """
    # Start parent span for overall order creation
    with tracer.start_as_current_span(
        "create_order",
        kind=trace.SpanKind.SERVER
    ) as span:
        try:
            # Create child span for inventory service call
            with tracer.start_as_current_span(
                "inventory_request",
                kind=trace.SpanKind.CLIENT
            ) as client_span:
                # Propagate trace context to inventory service
                carrier = {}
                trace_propagator.inject(carrier)
                
                headers = {
                    **carrier,
                    "Accept": "application/json",
                    "Content-Type": "application/json"
                }
                
                print(f"[DEBUG] Sending request with trace headers: {carrier}")
                
                # Make HTTP request to inventory service
                async with httpx.AsyncClient() as client:
                    response = await client.get(
                        f"http://localhost:8001/inventory/check/{order_id}",
                        headers=headers,
                        timeout=5.0
                    )
                    response.raise_for_status()
                    inventory_data = response.json()
                    print(f"[DEBUG] Received response: {inventory_data}")
            
            # Return success or out_of_stock based on inventory check
            return OrderResponse(
                status="success" if inventory_data["in_stock"] else "out_of_stock",
                order_id=order_id
            )
        except Exception as e:
            print(f"[ERROR] {type(e).__name__}: {str(e)}")
            span.set_attribute("error", True)
            return OrderResponse(status="error", order_id=order_id)

@app.get("/health")
async def health_check():
    """Simple health check endpoint."""
    return {"status": "healthy"}
EOF

Note: Replace <AUTH_TOKEN> with your own authentication token here as well.

Step 4: Start Both Services

Terminal 1 (Inventory Service):

cd ~/distributed-tracing-demo/inventory-service

# Ensure virtual environment is activated 
source ../venv/bin/activate 

# Start inventory service 
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8001

Terminal 2 (Order Service):

cd ~/distributed-tracing-demo/order-service 

# Ensure virtual environment is activated 
source ../venv/bin/activate 

# Start order service 
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000

Step 5: Verify Services are Running

# Check if both services are healthy 
curl http://localhost:8000/health 
curl http://localhost:8001/health 

# Both should return {"status": "healthy"}

Step 5: Generate Some Traces

# Send multiple test requests 
for i in {1..10}; do 
    curl -X POST "http://localhost:8000/orders/create?order_id=ORDER$i" 
    echo "" 
    sleep 1 
done

You should see responses like:

{"status":"success","order_id":"ORDER1"}
{"status":"success","order_id":"ORDER2"}
{"status":"success","order_id":"ORDER3"}
{"status":"success","order_id":"ORDER4"}
{"status":"success","order_id":"ORDER5"}
{"status":"out_of_stock","order_id":"ORDER6"}
{"status":"success","order_id":"ORDER7"}
{"status":"out_of_stock","order_id":"ORDER8"}
{"status":"out_of_stock","order_id":"ORDER9"}
{"status":"success","order_id":"ORDER10"}

Now, you are ready to view the traces within OpenObserve.

Step 6: View Traces in OpenObserve

Open http://localhost:5080 in your browser
Login with:
- Email: <your email address>
- Password: <your password>
Navigate to the Traces section: http://localhost:5080/web/traces?org_identifier=default

Interactive and detailed visual display of distributed traces on OpenObserve

Troubleshooting

If you still don’t see traces within the OpenObserve dashboard, you can try the following:

Verify OpenObserve credentials:

curl -H "Authorization: Basic cm9vdEBleGFtcGxlLmNvbTo4ajdUT0dnTGdtMzhROGVO" \   
http://localhost:5080/api/default/traces

Check OpenObserve logs:

docker logs openobserve

Try updating the exporter configuration to use HTTPS:

otlp_exporter = OTLPSpanExporter(  
endpoint="https://localhost:5080/api/default/v1/traces",  
headers={  
"Authorization": "Basic <AUTH_TOKEN>"  
}, insecure=True)

Don’t forget to restart the services after any configuration changes.

By following these steps, you should now be able to see traces flowing into OpenObserve from both your order and inventory services! You can add additional services as needed for further exploration and visualization. And if you’re interested, you can also refer to the GitHub repository for this guide to quickly get up and running with this example and iterate upon it.

Effectively Analyzing Trace Data

Once you've set up distributed tracing across your services and started collecting trace data in OpenObserve (or another platform), it's time to analyze that data effectively:

A. Service Dependency Mapping

Trace data allows you to visualize relationships between your services. For example, in our order processing system, you can:

Identify Critical Paths: Map the complete flow from order creation through inventory checks, showing how services depend on each other.
Detect Hidden Dependencies: Discover unexpected service interactions that might not be documented.
Analyze Request Patterns: Understand how often services communicate and which paths are most frequently used.

$Visualization of trace service map for create\_order on OpenObserve$

B. Performance Bottlenecks

Look for spans with unusually long durations—these often indicate areas ripe for optimization:

Service Latency: Identify which services take the longest to respond
Database Operations: Spot slow queries or frequent database calls
External Service Calls: Monitor third-party API response times
Resource Utilization: Correlate slow spans with CPU/memory usage

For instance, if your checkout process shows delays in inventory checks during peak traffic times, you may need to scale up resources or optimize database queries.

Visualization of trace timeline for create_order on OpenObserve

C. Error Analysis

Traces containing error tags can help you quickly identify which service is failing under what conditions—whether it's due to external API failures (e.g., Stripe’s payment gateway) or internal bugs.

Use trace data to identify failure points quickly:

Error Patterns: Group similar errors to identify systemic issues
Failure Impact: See how errors in one service affect dependent services
Recovery Paths: Analyze how your system handles and recovers from failures
Error Rates: Track error frequencies across different services and endpoints

Errors clearly displayed alongside specific trace details on OpenObserve

By collecting and analyzing trace data, you can better plan capacity, prioritize optimization efforts, improve system reliability, and make more data-driven architectural decisions.

Best Practices for Distributed Tracing

To get the most out of your distributed tracing setup, consider the following:

Consistent Naming Conventions: Use clear span names (e.g., order-service.create_order) to identify critical operations quickly and easily.
Meaningful Attributes: Add relevant attributes such as user_id, order_value, or feature_flag to provide context.
Security Considerations: Be mindful of sensitive data in traces; scrub or mask personal identifiers where necessary.
Continuous Monitoring: Regularly review trace data trends—don’t wait until something breaks!

Visualization of trace trends over time on OpenObserve

Common Pitfalls to Avoid

While powerful when used correctly, there are some common pitfalls related to distributed tracing that you should watch out for:

Over-instrumentation: Adding too many spans can increase overhead without adding much value.
Ignoring Context Propagation: Ensure trace context is properly propagated between services; otherwise, traces will appear disconnected.
Neglecting Asynchronous Operations: Asynchronous tasks (e.g., background jobs) can be tricky—ensure you link all of the spans correctly across async boundaries.

From Chaos to Clarity: Make Distributed Tracing Work for You

When you’re dealing with complex systems, flying blind is not a sustainable option. Distributed tracing is no longer a “nice-to-have” feature; it’s an essential survival tool for modern engineering teams. Thankfully, with tools like OpenTelemetry and modern observability platforms like OpenObserve, implementing distributed tracing is more accessible than ever.

Start small, instrument your critical paths, and celebrate as the fog of uncertainty lifts from your distributed systems. Your future self—probably debugging a production issue at 3 AM—will thank you for it.

Ready to implement distributed tracing but have some lingering questions? Join our Slack community or reach out directly—we're here to help!

About the Author

Nitya Timalsina

Nitya is a Developer Advocate at OpenObserve, with a diverse background in software development, technical consulting, and organizational leadership. Nitya is passionate about open-source technology, accessibility, and sustainable innovation.

Latest From Our Blogs

View all posts

OpenTelemetry Astronomy Shop Demo App with OpenObserve Integration

Engineering

OpentelemetryTracingLogging

OpenTelemetry Astronomy Shop Demo App with OpenObserve Integration

Deploy the OpenTelemetry Astronomy Shop demo and stream traces, metrics, and logs to OpenObserve with Helm. Easily configure OTLP and visualize everything in dashboards.

Chaitanya Sistla

2025-10-13

How to

ObservabilityCost Optimization

OpenObserve Usage Reporting

Turn on usage reporting in OpenObserve to see ingest volume, API activity, and errors captured in the _meta org for auditing and oversight.

Alerting 101: From Concept to Demo

Alerting fundamentals with a live demo: set triggers, format messages, and route alerts, both real-time and scheduled, in OpenObserve.

Simran Kumari

2025-10-08

How to Build Metrics Dashboards for SRE & DevOps Teams

Engineering

DevOpsDashboardsObservability

How to Build Metrics Dashboards for SRE & DevOps Teams

Build SRE/DevOps metrics dashboards fast. Docker, Jenkins, Kubernetes, GitHub, Argo CD, Prometheus, Hosts: what to track, why, and importable JSON.

OpenObserve Report Server Setup: Generate and Email Dashboards

Set up OpenObserve Report Server to turn dashboards into scheduled PDF emails. Learn env vars, SMTP (Gmail/SES), and Docker run.

Prometheus Metrics Count Basics: A Beginner's Complete Guide

Learn how Prometheus metrics counts unique label values using basic queries and methods like `count(count by (label) (metric))`.

Simran Kumari

2025-09-26

Observability Dashboards: What to Show and How to Build Them

Engineering

DashboardsObservability

Observability Dashboards: What to Show and How to Build Them

A practical guide to observability dashboards: what to track, how to design panels, and how to build them with examples for logs, metrics, and traces.

A Beginner’s Guide to OpenTelemetry OTLP Exporters

This guide explains OpenTelemery OTLP Exporters, provides an overview and setup with simple step-by-step instructions.

How to Set Up Telegraf with OpenObserve for Easy Metrics Collection

Learn how to collect HTTP metrics with Telegraf and send them to OpenObserve: copy‑paste configs, verification steps, and production best practices.

Manas Sharma

2025-09-19

Enterprise Observability Strategy: Scale, Cost & Reliability

Engineering

ObservabilityOpenobserve

Enterprise Observability Strategy: Scale, Cost & Reliability

Learn how to design an enterprise observability strategy built on efficient logging, cost control, and unified insights to scale with complex systems.

Manas Sharma

2025-09-12