Understanding Elasticsearch Cluster Health Status

Introduction

Is your Elasticsearch cluster a black box? You write flawless code, but searches feel like a guessing game. Optimizing performance starts with understanding cluster health.

Elasticsearch cluster health is a metric to monitor that provides insights into the overall status and performance of your cluster. The health status of an Elasticsearch cluster can be green, yellow, or red.

Health Status Colors: Green, Yellow, and Red

Green Status: All primary and replica shards are allocated. This indicates that the cluster is ready for use.

Yellow Status: All primary shards are allocated, but few replica shards are not. This indicates that the cluster is partially functional.

Red Status: One or more primary shards are not allocated. This indicates that the cluster may experience significant issues.

Let’s understand the role of the command elasticsearc_cluster/health.

Role of the Elasticsearch _cluster/health

The _cluster/health endpoint is used to monitor the health status of an Elasticsearch cluster. This endpoint returns a simple status on the health of the cluster and is used to get the health status of specific data streams and indices.

Example:

bash
GET /_cluster/health

This command returns the health status of the entire cluster. You can also specify a target data stream or index to get the health status of a specific part of the cluster.

Example:

bash
GET /_cluster/health/my-index-000001

This command returns the health status of the my-index-000001 index.

Elasticsearch cluster health is a critical metric to monitor and understand the different health status colors. Now, let’s understand how to check cluster health withcURL.

About Us | Open Source Observability Platform

Checking Cluster Health with cURL

Here are the key points about using the _cluster/health endpoint to check Elasticsearch cluster health.

The _cluster/health endpoint provides an overview of the health status of an Elasticsearch cluster.

About Us | Open Source Observability Platform

OpenObserve: Open source Elasticsearch/Datadog/Splunk alternative in Rust for logs, metrics, traces. 140x lower storage cost

Checking Overall Cluster Health

To check the overall health of the cluster, use the following cURL command:

bash
curl -X GET "http://localhost:9200/_cluster/health?pretty"

This command will return a JSON object with the cluster health status and other metrics.

Checking Index Health

To check the health of a specific index, use the _cluster/health/{index} endpoint:

bash
curl -X GET "http://localhost:9200/_cluster/health/my_index?pretty"

Replace {index} with the name of the index you want to check.

Checking Node Health

To check the health of individual nodes in the cluster, use the _cat/nodes endpoint:

bash
curl -X GET "http://localhost:9200/_cat/nodes?v&h=name,role,heap.percent,cpu,load_1m,status"

This command will return a tabular output with information about each node, including its role, heap usage, CPU usage, and more. Now, let’s move on to interpreting the response.

Interpreting the Health Check Response

The JSON response from the _cluster/health endpoint contains the following key fields:

status:

The overall health status of the cluster can be "green", "yellow", or "red".
timed_out: Indicates whether the request timed out.
number_of_nodes: The number of nodes in the cluster.
number_of_data_nodes: The number of data nodes in the cluster.
active_primary_shards: The number of active primary shards.
active_shards: The total number of active shards (primary and replica).
relocating_shards: The number of shards that are being moved from one node to another.
initializing_shards: The number of shards that are being initialized.
unassigned_shards: The number of shards that are not allocated to any node.

By regularly checking the health of your Elasticsearch cluster using the _cluster/health endpoint, you can monitor its status, detect potential issues, and take corrective actions.

Get started for FREE with OpenObserve

OpenObserve serves as a replacement for Elasticsearch for users who ingest data using APIs and perform searches. OpenObserve comes with its user interface, eliminating the need for separate installation.

Here are the results from pushing logs from production Kubernetes cluster to both Elasticsearch and OpenObserve using Fluent Bit.

Interpreting the Health Check Response

Let’s move on to understanding advanced monitoring with cat health API.

Advanced Monitoring with the Cat Health API

Here are the key points about using the cat health API in Elasticsearch:

About Us | Open Source Observability Platform

Cat Health API

Purpose: The cat health API provides an overview of the health status of an Elasticsearch cluster.

Output format: It returns a human-readable text output with columns for various health metrics.

Intended use: The cat health API is designed for human consumption, such as in the Kibana console or command line.

Permissions Required

Cluster privileges: To use the cat health API, you need the monitor cluster privilege.

Index privileges: If you want to check the health of specific indices, you also need the monitor index privilege for those indices.

Examples of using the cat health API

Checking overall cluster health:

bash
GET /_cat/health?v

This returns the overall health status of the cluster, along with metrics like the number of nodes, shards, and data streams.

Checking health of specific indices

bash
GET /_cat/health/my-index-000001?v

Replace my-index-000001 with the name of the index you want to check.

Customizing output

Use the ?h= parameter to specify which columns to include, e.g., ?h=status,node.total,node.data,shards.
Use the ?s= parameter to sort the output by specific columns, e.g., ?s=status:desc.
Use the ?format= parameter to change the output format to JSON, YAML, or CBOR.

Getting help

Use the ?help parameter to see the available columns for each cat API endpoint.

By using the cat health API, you can quickly check the overall health of your Elasticsearch cluster and identify any issues that need attention. Now, you will be acquainted with the ways to diagnose and fix yellow and red status.

About Us | Open Source Observability Platform

Diagnosing and Fixing Yellow and Red Health Statuses

Here are the key points about diagnosing and fixing yellow and red health statuses in Elasticsearch:

Common Reasons for Yellow and Red Statuses

Unassigned Shards: Shards not allocated to any node. This can occur due to node failures, insufficient resources, or configuration issues.
Initializing Shards: Shards currently being initialized, either during cluster startup or after a node failure.
Relocating Shards: Shards being moved from one node to another, typically to balance the load across the cluster.
Disk Space Issues: Lack of disk space on nodes can cause shards to become unassigned.
Node Failures: Node failures can cause shards to become unassigned.

Steps to Diagnose Cluster Issues

Check Cluster Health: Use the _cluster/health API to check the overall health status of the cluster.
View Unassigned Shards: Use the _cat/shards API to view unassigned shards and their reasons for being unassigned.
Check Node Allocation: Use the _cluster/allocation_explain API to check node allocation decisions and identify potential issues.
Check Disk Space: Check disk space on nodes to ensure there is sufficient space for shards.

Specific Fixes for Common Problems

Unassigned Shards:
- Check node allocation decisions and ensure nodes have sufficient resources.
- Check disk space on nodes and ensure there is sufficient space for shards.
Initializing Shards:
- Check cluster startup and ensure all nodes are properly initialized.
- Check node failures and ensure nodes are properly restarted.
Relocating Shards:
- Check shard balancing and ensure shards are properly distributed across the cluster.
- Check node failures and ensure nodes are properly restarted.
Disk Space Issues:
- Check disk space on nodes and ensure there is sufficient space for shards.
- Consider increasing disk space or using a different storage solution.
Node Failures:
- Check node failures and ensure nodes are properly restarted.
- Consider using a different node or cluster configuration.

Get started for FREE with OpenObserve

Best Practices for Maintaining Cluster Health

Here are the best practices for maintaining cluster health:

Regularly check cluster health: Use the _cluster/health API to monitor the overall health of your Elasticsearch cluster.
Monitor node-level metrics: Use the _nodes API to monitor node-level metrics such as CPU usage, memory usage, and disk I/O.
Monitor index-level metrics: Use the _indices API to monitor index-level metrics such as document count, store size, and indexing rate.
Ensure sufficient replicas: Ensure that you have sufficient replicas for your available nodes to maintain data redundancy and availability.
Adjust replica settings: Adjust replica settings based on your specific use case and resource availability.
Monitor disk space: Monitor disk space on nodes to ensure there is sufficient space for shards.
Monitor memory usage: Monitor memory usage on nodes to ensure there is sufficient memory for shards.
Adjust shard allocation: Adjust shard allocation settings based on disk space and memory availability.

By following these best practices, you can ensure the health and performance of your Elasticsearch cluster. Let’s conclude the learnings from the article that gives a quick overview.

Get started for FREE with OpenObserve

Conclusion

Elasticsearch cluster health provides vital insights into your cluster's functionality. Use the _cluster/health endpoint to monitor its status (green, yellow, or red) and identify potential issues.

Green indicates optimal health (all shards allocated).
Yellow suggests partial functionality (all primary shards allocated, but some replica shards missing).
Red signifies critical issues (unallocated primary shards).

Regular cluster health checks and understanding the reasons behind yellow/red statuses are crucial for maintaining a healthy cluster. Here is a brief summary to help you.

Key Tools for Monitoring and Troubleshooting:

_cluster/health API: Provides overall health status and metrics.
_cat/health API: Offers a quick readable health overview.
_cat/shards API: Lists unassigned shards and reasons for unassignment.
_cluster/allocation_explain API: Analyzes node allocation decisions.

Common Causes of Yellow/Red Statuses:

Unassigned Shards: Node failures, insufficient resources, or configuration issues.
Initializing Shards: Shards being set up during startup or after node failures.
Relocating Shards: Shards being moved for load balancing.
Disk Space Issues: Lack of disk space on nodes.
Node Failures: Nodes malfunctioning, causing shard unavailability.

Resolving these issues involves:

Checking node allocation and disk space.
Monitoring node startup and failures.
Analyzing shard balancing and node restarts.
Increasing disk space or using alternative storage solutions.

Best Practices for Maintaining Cluster Health:

Regularly monitor cluster health using _cluster/health.
Track node-level metrics (CPU, memory, disk I/O) with _nodes.
Monitor index-level metrics (document count, size, indexing rate) with _indices.
Ensure sufficient replicas for data redundancy and availability.
Adjust replica settings based on needs and resource constraints.
Monitor and adjust shard allocation based on disk space and memory usage.

By following these practices, you can maintain a healthy and performant Elasticsearch cluster.

Book a free demo on OpenObserve Today

References Articles:

https://stackoverflow.com/questions/27364670/how-to-check-elasticsearch-cluster-health

https://opster.com/guides/elasticsearch/operations/elasticsearch-health-check/

https://www.elastic.co/guide/en/cloud/current/ec-monitoring.html

https://www.yireo.com/blog/2022-08-31-elasticsearch-cluster-is-yellow-which-is-ok

Youtube Videos:
Elasticsearch basic concepts | cluster, shards, nodes | Elasticsearch tutorial for beginners

How to monitor health check of elasticsearch | Elasticsearch tutorial

What Is Elasticsearch | Elasticsearch Explained | Elasticsearch |

Top Elasticsearch Metrics You've Got to Monitor

Author:

The OpenObserve Team comprises dedicated professionals committed to revolutionizing system observability through their innovative platform, OpenObserve. Dedicated to streamlining data observation and system monitoring, offering high performance and cost-effective solutions for diverse use cases.

Author

OpenObserve Team

Resources

Understanding Elasticsearch Cluster Health Status

Introduction

Health Status Colors: Green, Yellow, and Red

Role of the Elasticsearch _cluster/health

Checking Cluster Health with cURL

Checking Overall Cluster Health

Checking Index Health

Checking Node Health

Interpreting the Health Check Response

Advanced Monitoring with the Cat Health API

Cat Health API

Permissions Required

Checking health of specific indices

Customizing output

Getting help

Diagnosing and Fixing Yellow and Red Health Statuses

Common Reasons for Yellow and Red Statuses

Steps to Diagnose Cluster Issues

Specific Fixes for Common Problems

Best Practices for Maintaining Cluster Health

Conclusion

Author:

Tags

Recent Posts