OpenObserve Operator - Troubleshooting Guide
This guide provides kubectl commands and procedures for manually managing OpenObserve resources when they become stuck or need manual intervention.
Table of Contents
- Resource Types
- Listing Resources
- Viewing Resource Details
- Patching Resources
- Deleting Stuck Resources
- Common Troubleshooting Scenarios
- Best Practices
Resource Types
The OpenObserve operator manages the following Custom Resource Definitions (CRDs):
| Short Name | Full Name | Purpose |
|---|---|---|
o2config |
openobserveconfig |
Connection configuration to OpenObserve instance |
o2alert |
openobservealert |
Alert definitions |
o2pipeline |
openobservepipeline |
Data pipeline configurations |
o2function |
openobservefunction |
VRL/JavaScript functions |
o2alerttemplate |
openobservealerttemplate |
Alert notification templates |
o2dest |
openobservedestination |
Alert and pipeline destinations |
Listing Resources
List all resources of a specific type
# List in current namespace
kubectl get openobservealerts
kubectl get o2alerts # Using short name
# List in specific namespace
kubectl get openobservealerts -n o2operator
# List across all namespaces
kubectl get openobservealerts -A
kubectl get openobservealerts --all-namespaces
# List multiple resource types at once
kubectl get o2alerts,o2pipelines,o2functions -n o2operator
# List with more details
kubectl get openobservealerts -o wide
# List with custom columns
kubectl get openobservedestinations \
-o custom-columns=NAME:.metadata.name,TYPE:.spec.type,CATEGORY:.status.destinationCategory,TEMPLATE:.spec.template
Filter resources
# Using label selectors
kubectl get openobservealerts -l environment=production
# Using field selectors
kubectl get openobservealerts --field-selector metadata.name=my-alert
# Using grep for quick filtering
kubectl get openobservealerts | grep critical
# List resources in specific states
kubectl get openobservealerts -o json | jq '.items[] | select(.status.conditions[]? | select(.type=="Ready" and .status=="False")) | .metadata.name'
Viewing Resource Details
Get detailed information about a resource
# Describe a resource (human-readable format)
kubectl describe openobservealert my-alert -n o2operator
# Get full YAML output
kubectl get openobservealert my-alert -n o2operator -o yaml
# Get specific fields using JSONPath
kubectl get openobservealert my-alert -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'
# Get JSON output and filter with jq
kubectl get openobservealert my-alert -o json | jq '.status'
# View events related to a resource
kubectl get events --field-selector involvedObject.name=my-alert -n o2operator
Check resource status
# Check if resource is ready
kubectl get openobservealert my-alert -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
# View last error
kubectl get openobservefunction my-function -o jsonpath='{.status.lastError}'
# Check deletion timestamp (for stuck resources)
kubectl get openobservedestination my-dest -o jsonpath='{.metadata.deletionTimestamp}'
# View finalizers
kubectl get openobservealert my-alert -o jsonpath='{.metadata.finalizers[*]}'
Patching Resources
Update resource specifications
# Patch using strategic merge
kubectl patch openobservealert my-alert -n o2operator \
--type='merge' -p '{"spec":{"enabled":false}}'
# Patch using JSON patch
kubectl patch openobservealert my-alert -n o2operator \
--type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": false}]'
# Update multiple fields
kubectl patch openobservedestination my-dest -n o2operator \
--type='merge' -p '{
"spec": {
"url": "https://new-webhook-url.com",
"skipTlsVerify": true
}
}'
# Add labels
kubectl label openobservealert my-alert environment=staging
# Add annotations
kubectl annotate openobservealert my-alert description="Critical CPU alert"
Remove finalizers (for stuck resources)
# Remove all finalizers
kubectl patch openobservedestination stuck-dest -n o2operator \
-p '{"metadata":{"finalizers":[]}}' --type=merge
# Remove specific finalizer
kubectl patch openobservealert stuck-alert -n o2operator \
--type='json' -p='[{"op": "remove", "path": "/metadata/finalizers/0"}]'
# Alternative method using kubectl edit
kubectl edit openobservealert stuck-alert -n o2operator
# Then manually remove the finalizers section and save
Deleting Stuck Resources
Standard deletion
# Delete a single resource
kubectl delete openobservealert my-alert -n o2operator
# Delete using a manifest file
kubectl delete -f alert-definition.yaml
# Delete multiple resources
kubectl delete openobservealerts alert1 alert2 alert3 -n o2operator
# Delete all resources of a type
kubectl delete openobservealerts --all -n o2operator
# Delete with grace period
kubectl delete openobservealert my-alert --grace-period=30
# Force immediate deletion (use with caution)
kubectl delete openobservealert my-alert --grace-period=0 --force
Handling stuck resources
When a resource is stuck in "Terminating" state:
# 1. Check why it's stuck
kubectl describe openobservedestination stuck-dest -n o2operator
# 2. Check for finalizers
kubectl get openobservedestination stuck-dest -n o2operator -o yaml | grep -A 5 finalizers
# 3. Check operator logs for errors
kubectl logs -n o2operator -l app=openobserve-operator --tail=50 | grep stuck-dest
# 4. Remove finalizers if the external resource is already deleted
kubectl patch openobservedestination stuck-dest -n o2operator \
-p '{"metadata":{"finalizers":null}}' --type=merge
# 5. If still stuck, edit and remove finalizers manually
kubectl edit openobservedestination stuck-dest -n o2operator
Common Troubleshooting Scenarios
Scenario 1: Resource stuck in deletion due to "in use" error
# Example: Alert template can't be deleted because it's used by a destination
# 1. Identify what's using the resource
kubectl get openobservedestinations -A -o json | \
jq '.items[] | select(.spec.template=="my-template") | {name: .metadata.name, namespace: .metadata.namespace}'
# 2. Delete the dependent resources first
kubectl delete openobservedestination dependent-dest -n o2operator
# 3. If the dependent resource is also stuck, remove its finalizer
kubectl patch openobservedestination dependent-dest -n o2operator \
-p '{"metadata":{"finalizers":[]}}' --type=merge
# 4. Now delete the original resource
kubectl delete openobservealerttemplate my-template -n o2operator
Scenario 2: Resource out of sync with OpenObserve
# When Kubernetes resource exists but OpenObserve resource doesn't
# 1. Check resource status
kubectl get openobservealert my-alert -n o2operator -o jsonpath='{.status}'
# 2. Force reconciliation by updating a label
kubectl label openobservealert my-alert reconcile=true --overwrite
# 3. If reconciliation fails, check operator logs
kubectl logs -n o2operator deployment/openobserve-operator --tail=100 | grep my-alert
# 4. As last resort, delete and recreate
kubectl delete openobservealert my-alert -n o2operator
kubectl apply -f my-alert.yaml
Scenario 3: Webhook validation blocking operations
# When webhook prevents deletion/updates
# 1. Check webhook configuration
kubectl get validatingwebhookconfigurations | grep openobserve
# 2. Temporarily disable webhook (emergency only!)
kubectl delete validatingwebhookconfiguration openobserve-webhook-config
# 3. Perform the operation
kubectl delete openobservealerttemplate stuck-template -n o2operator
# 4. Reinstall webhook
kubectl apply -f manifests/04-webhook.yaml
Scenario 4: Mass cleanup of failed resources
# Delete all resources in Failed state
# 1. List failed resources
kubectl get openobservealerts -n o2operator -o json | \
jq '.items[] | select(.status.conditions[]? | select(.type=="Ready" and .status=="False")) | .metadata.name'
# 2. Delete them
kubectl get openobservealerts -n o2operator -o json | \
jq -r '.items[] | select(.status.conditions[]? | select(.type=="Ready" and .status=="False")) | .metadata.name' | \
xargs -I {} kubectl delete openobservealert {} -n o2operator
# 3. Clean up stuck resources with finalizers
for resource in $(kubectl get openobservealerts -n o2operator -o name | grep terminating); do
kubectl patch $resource -n o2operator -p '{"metadata":{"finalizers":[]}}' --type=merge
done
Scenario 5: Debugging reconciliation failures
# 1. Check resource events
kubectl describe openobservealert my-alert -n o2operator | tail -20
# 2. Check operator logs with increased verbosity
kubectl logs -n o2operator deployment/openobserve-operator --tail=100 -f
# 3. Check resource generation vs observed generation
kubectl get openobservealert my-alert -n o2operator \
-o jsonpath='{.metadata.generation} vs {.status.observedGeneration}'
# 4. Check retry count for deletion
kubectl get openobservealert my-alert -n o2operator \
-o jsonpath='{.status.deletionRetryCount}'
# 5. Force requeue by changing something trivial
kubectl annotate openobservealert my-alert force-sync="`date`" --overwrite
Best Practices
DO's:
- Always use kubectl delete for removing resources instead of manually deleting from OpenObserve UI
- Check dependencies before deleting resources (e.g., templates used by destinations)
- Monitor operator logs when performing operations to understand any failures
- Use dry-run to preview changes before applying them:
- Backup resources before major changes:
DON'Ts:
- Don't manually delete resources from OpenObserve UI - This causes sync issues
- Don't remove finalizers unless you understand the consequences - Data loss may occur
- Don't force delete without checking logs first - You might miss important errors
- Don't disable webhooks in production - They provide important validation
- Don't ignore resource dependencies - Check what uses a resource before deleting it
Getting Help
View operator logs
# Current logs
kubectl logs -n o2operator deployment/openobserve-operator
# Follow logs
kubectl logs -n o2operator deployment/openobserve-operator -f
# Previous container logs (if crashed)
kubectl logs -n o2operator deployment/openobserve-operator --previous
# Logs with timestamps
kubectl logs -n o2operator deployment/openobserve-operator --timestamps
Check operator status
# Operator pod status
kubectl get pods -n o2operator -l app=openobserve-operator
# Operator deployment status
kubectl rollout status deployment/openobserve-operator -n o2operator
# Operator resource usage
kubectl top pods -n o2operator -l app=openobserve-operator
Export resources for support
# Export all OpenObserve resources
for crd in alerts pipelines functions alerttemplates destinations configs; do
kubectl get openobserve${crd} -A -o yaml > openobserve-${crd}-export.yaml
done
# Create a support bundle
kubectl cluster-info dump --namespaces o2operator --output-directory ./support-bundle
Emergency Recovery
If the operator is completely broken and resources need emergency cleanup:
#!/bin/bash
# Emergency cleanup script - USE WITH EXTREME CAUTION
NAMESPACE="o2operator"
# Remove all finalizers from all OpenObserve resources
for resource_type in openobservealerts openobservepipelines openobservefunctions openobservealerttemplates openobservedestinations; do
echo "Cleaning up $resource_type..."
kubectl get $resource_type -n $NAMESPACE -o name | while read resource; do
echo " Removing finalizers from $resource"
kubectl patch $resource -n $NAMESPACE -p '{"metadata":{"finalizers":[]}}' --type=merge
done
done
echo "Emergency cleanup complete. Resources should now delete normally."
Note: This guide assumes you have appropriate RBAC permissions to perform these operations. Always test commands in a development environment before running in production.