Resources

Monitoring for Azure Kubernetes Service Cluster

July 18, 2024 by OpenObserve Team
aks monitoring

Introduction to AKS Monitoring

Running containerized applications on Azure Kubernetes Service (AKS) offers agility and scalability, but ensuring their smooth operation requires vigilance. Here's where AKS monitoring comes in – a comprehensive approach to overseeing your AKS clusters.

Introduction to AKS Monitoring

Image Credit

AKS monitoring is the process of collecting, analyzing, and acting on telemetry data from Azure Kubernetes Service (AKS) clusters to ensure their performance, security, and reliability.

The process typically includes vulnerability scanning, performance monitoring, and troubleshooting to identify and address potential issues before they impact applications.

Importance of AKS Cluster Monitoring

  • Performance Insights: Monitoring helps gain visibility into the performance of AKS clusters, including resource utilization, node performance, and pod health, enabling identification and addressing of performance bottlenecks.
  • Resource Optimization: Monitoring helps identify underutilized resources and optimize node scaling, reducing costs and improving resource efficiency.
  • Troubleshooting: Monitoring provides detailed logs and metrics, making it easier to troubleshoot issues, diagnose errors, and ensure the overall health of applications running on AKS.
  • Security and Compliance: Monitoring helps detect and respond to security threats and compliance violations by providing insights into activities and changes within AKS clusters.

Get started for FREE with OpenObserve

Overview of Tools and Services Used for AKS Monitoring

  • Azure Monitor: Azure Monitor is a comprehensive monitoring solution that provides insights into the performance, health, and activity of AKS clusters. It includes features like metric data collection, log analysis, and alerting.
  • Container Insights: Container Insights is a monitoring solution for AKS clusters that provides detailed insights into container performance, resource utilization, and other metrics.
  • Azure Log Analytics: Azure Log Analytics is a log analysis service that helps monitor and troubleshoot AKS clusters by providing detailed logs and metrics.

AKS monitoring is crucial for ensuring the smooth operation and performance of AKS clusters. It involves collecting, analyzing, and acting on telemetry data from AKS clusters to identify and address potential issues. Azure Monitor, Container Insights, and Azure Log Analytics are key tools and services used for AKS monitoring.

Get started for FREE with OpenObserve

In the following section, you will learn how to set up monitoring for AKS.

Setting Up Monitoring for AKS

Monitoring is a crucial aspect of managing and maintaining your Azure Kubernetes Service (AKS) clusters. Azure provides robust monitoring capabilities through Azure Monitor and Azure Log Analytics, allowing you to gain insights into the performance, health, and activity of your AKS clusters. In this section, we will explore the essential steps for setting up monitoring for AKS clusters.

Join OpenObserve - GitHub

Types of Supported Clusters

Azure Monitor supports two types of clusters:

  • AKS Clusters: These are the standard AKS clusters that are managed by Azure.
  • Arc-enabled Kubernetes Clusters: These are clusters that are managed by Azure Arc, which provides a managed service for Kubernetes clusters.

Prerequisites for Enabling Monitoring

Before enabling monitoring for your AKS clusters, you need to ensure that you have the necessary permissions and workspaces set up.

  • Permissions: You need to have at least Contributor access to the cluster for onboarding.
  • Required Workspaces: You need to have a Log Analytics workspace set up for storing monitoring data.

Steps to Enable Prometheus Metrics Scraping and Grafana for Visualizations

To enable Prometheus metrics scraping and Grafana for visualizations, follow these steps:

  • Enable Prometheus Metrics Scraping: Use the Azure CLI command
    az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <cluster-resource-group-name> 
    

  • Enable Grafana: Use the Azure CLI command
    az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <cluster-resource-group-name> --grafana-resource-id <grafana-workspace-name-resource-id> 
    

Full Monitoring through the Azure Portal

To enable full monitoring through the Azure portal for new and existing AKS clusters, follow these steps:

  • New Clusters: Use the Azure portal to create a new AKS cluster and enable monitoring during the creation process.
  • Existing Clusters: Use the Azure portal to enable monitoring for an existing AKS cluster by navigating to the cluster's "Monitoring" section and selecting "Enable Monitoring".

Join OpenObserve - GitHub

Get started for FREE with OpenObserve

Monitoring Approaches for Windows vs. Linux Clusters

The monitoring approach for Windows and Linux clusters differs in the following ways:

Metric Windows Clusters Linux Clusters
Node Performance Use win_node_performance metric to monitor node CPU usage, memory usage, and disk usage Use node_performance metric to monitor node CPU usage, memory usage, and disk usage
Pod Performance Use win_pod_performance metric to monitor pod CPU usage, memory usage, and network traffic Use pod_performance metric to monitor pod CPU usage, memory usage, and network traffic
Application Performance Use win_app_performance metric to monitor application CPU usage, memory usage, and network traffic Use app_performance metric to monitor application CPU usage, memory usage, and network traffic
Cluster Health Use win_cluster_health metric to monitor cluster health, including node and pod status Use cluster_health metric to monitor cluster health, including node and pod status
Security Use win_security metric to monitor security events, including authentication and authorization Use security metric to monitor security events, including authentication and authorization
Network Traffic Use win_network_traffic metric to monitor network traffic, including ingress and egress traffic Use network_traffic metric to monitor network traffic, including ingress and egress traffic
Storage Use win_storage metric to monitor storage usage, including disk usage and file system usage Use storage metric to monitor storage usage, including disk usage and file system usage
System Events Use win_system_events metric to monitor system events, including system logs and system metrics Use system_events metric to monitor system events, including system logs and system metrics

By following the steps outlined in this section, you can enable full monitoring for your AKS clusters and gain valuable insights into their performance’s health.

In the next section, you will learn about key monitoring data and metrics.

Key Monitoring Data and Metrics

Monitoring data and understanding metrics play a crucial role in managing and optimizing Azure Kubernetes Service (AKS) clusters. Here are some key aspects to consider:

Join OpenObserve - GitHub

Role of Monitoring Data

  • Monitoring data provides insights into the performance, health, and utilization of AKS clusters.
  • Monitoring data enables proactive issue detection and facilitates troubleshooting efforts.

Utilization of Default Platform Metrics

  • AKS provides default platform metrics through Azure Monitor, which cover node performance, pod performance, and cluster health.

Understanding Key Metrics

  • The Kubernetes control plane consists of several critical components, including the API server, scheduler, and controller manager.
  • Metrics such as API server request latency, scheduler pending pods, and controller manager workqueue depth provide insights into control plane operations.

Importance of Logs

  • Logs provide valuable information for troubleshooting and auditing purposes in AKS.
  • Control plane logs contain information related to the Kubernetes control plane components, such as the API server and scheduler.
  • Data plane logs contain information related to the applications running on the AKS cluster, such as container logs and application-specific logs.
  • Analyzing both control plane and data plane logs helps identify issues and gain a comprehensive understanding of the cluster's behavior.

Monitoring data and metrics are essential for managing and optimizing AKS clusters. This section enables proactive issue detection and effective troubleshooting.

In the following section we will deep dive into tools and integration required for monitoring.

Get started for FREE with OpenObserve

Tools and Integration for Enhanced Monitoring

OpenObserve for AKS Monitoring

OpenObserve is an innovative open-source observability platform designed to streamline the monitoring of logs, metrics, and traces. Here are some key features and benefits of using OpenObserve for AKS monitoring:

OpenObserve for AKS Monitoring

Image Credit

OpenObserve Features

  • Logs: OpenObserve provides an advanced embedded GUI for fast log searching with features like top 10, search around, SQL for query, and custom VRL functions.
  • Metrics: OpenObserve offers long-term storage for Prometheus metrics in S3 with support for SQL and PromQL.
  • Traces: OpenObserve provides distributed tracing with OpenTelemetry, allowing users to identify performance problems within a microservice or across a distributed architecture with precision.
  • Alerts: OpenObserve offers scheduled and real-time alerts that allow users to promptly address critical issues and dispatch alerts to multiple platforms using templates.
  • Dashboards: OpenObserve provides dashboards that present real-time data from logs, metrics, and traces in a visually appealing and efficient manner.

OpenObserve Benefits

  • Easier: OpenObserve is designed to be easy to use and deploy, with a simple installation process and a user-friendly interface.
  • Lower Storage Cost: OpenObserve provides storage functionality support in local Disk, S3, MinIO, GCS, Azure Blob, resulting in 140x lower storage cost compared to other solutions.
  • Scale: OpenObserve is highly scalable and can handle large volumes of data without performance degradation.
  • Setup Time: OpenObserve can be set up quickly, with a single deployment option available for single-node deployment.

OpenObserve Integration with AKS

  • Ingestion: OpenObserve can ingest logs and metrics from AKS clusters using various plugins, including fluentd, fluentbit, and Prometheus.
  • Querying: OpenObserve provides a powerful query engine that allows users to query logs and metrics using SQL and PromQL.
  • Visualization: OpenObserve provides dashboards that can be used to visualize logs, metrics, and traces from AKS clusters.

OpenObserve is a powerful tool for monitoring AKS clusters, providing comprehensive visibility into logs, metrics, and traces. Its ease of use, scalability, and lower storage cost make it an attractive option for organizations looking to streamline their monitoring capabilities.

Get started for FREE with OpenObserve

Join OpenObserve - GitHub

In the next section, you will learn about advanced monitoring techniques.

Advanced Monitoring Techniques

Advanced monitoring techniques are essential for ensuring the performance of Azure Kubernetes Service (AKS) clusters. Here are some advanced monitoring techniques for AKS:

  • Cluster Performance: Monitor cluster performance using metrics such as CPU usage, memory usage, and disk usage to identify bottlenecks and optimize resource allocation.
  • Network Traffic: Monitor network traffic using metrics such as packet loss, latency, and throughput to identify network issues and optimize network configuration.
  • Resource Utilization: Monitor resource utilization using metrics such as CPU usage, memory usage, and disk usage to identify resource bottlenecks and optimize resource allocation.
  • Alert Rules: Create and manage alert rules to receive prompt notifications of issues in your AKS cluster.
  • Alert Conditions: Define alert conditions based on specific metrics, such as CPU usage, memory usage, or disk usage.
  • Alert Actions: Define alert actions, such as sending notifications to a Slack channel or triggering a script to resolve the issue.
  • Network Observability: Enable network observability to gain detailed insights into cluster networking.
  • Network Metrics: Collect network metrics, such as packet loss, latency, and throughput, to identify network issues and optimize network configuration.
  • Network Visualization: Visualize network metrics using tools such as OpenObserve, Grafana or Kibana to gain a better understanding of network behavior.

By using the above techniques, you can gain detailed insights into your AKS cluster and ensure prompt notifications of issues.

Get started for FREE with OpenObserve

In the next section, you will see the best practices used in AKS Monitoring.

Best Practices for AKS Monitoring

Here are some key best practices for monitoring Azure Kubernetes Service (AKS):

  • Enable Prometheus metrics for your cluster to gain insights into performance and health.
  • Enable Container Insights to collect logs and performance data from your cluster.
  • Create diagnostic settings to collect control plane logs for AKS clusters.
  • Enable recommended Prometheus alerts to proactively notify you of issues.
  • Ensure the availability of the Log Analytics workspace supporting Container Insights.
  • Monitor all layers of your Kubernetes environment, including network, cluster, and application layers.
  • Use Azure Arc-enabled Kubernetes to monitor clusters running in other clouds.
  • Use Azure managed services for cloud native tools like Prometheus and Grafana.
  • Integrate AKS clusters into your existing monitoring tools.
  • Use Azure Policy to enable data collection for enabling Prometheus metrics, Container Insights, and diagnostic settings.

These best practices provide a comprehensive approach to monitoring AKS clusters, ensuring the performance your Kubernetes environment.

Join OpenObserve - GitHub

Troubleshooting and Optimizing AKS Monitoring

Troubleshooting and optimizing AKS monitoring involves identifying and addressing common issues, optimizing monitoring configurations for better performance and lower costs, and leveraging resources and next steps for further enhancing AKS cluster monitoring.

Troubleshooting Common Issues in AKS Monitoring Setups

  • Check Cluster Configuration: Ensure that the AKS cluster is properly configured for monitoring. Verify that the necessary components, such as Azure Monitor and Azure Log Analytics, are enabled and configured correctly.
  • Review Monitoring Data: Analyze monitoring data to identify potential issues. Use tools like Azure Monitor and Azure Log Analytics to review metrics, logs, and events, and identify patterns or anomalies that may indicate problems.
  • Check Network Connectivity: Verify that network connectivity is stable and functioning correctly. Ensure that the AKS cluster can communicate with Azure Monitor and Azure Log Analytics without any issues.
  • Check Resource Utilization: Monitor resource utilization to identify potential bottlenecks. Use tools like Azure Monitor and Azure Log Analytics to track CPU usage, memory usage, and disk usage, and identify areas where resources can be optimized.
  • Check for Misconfigured Alerts: Review alert configurations to ensure that they are set up correctly. Verify that alerts are triggered by the correct conditions and that notifications are sent to the correct recipients.

Optimizing Monitoring Configurations

  • Use Azure Monitor to collect and analyze metrics, logs, and events from your AKS cluster. This can help you identify performance issues and optimize resource utilization.
  • Use Azure Log Analytics to collect and analyze logs from your AKS cluster. This can help you identify security threats and troubleshoot issues.
  • Use OpenObserve: Use OpeObserve to monitor logs, metrics, and traces.
  • Use Prometheus to collect and analyze metrics from your AKS cluster.
  • Use Grafana to visualize monitoring data from your AKS cluster.

By following these strategies and using the right tools, you can ensure that your AKS cluster is properly monitored and optimized for performance and security.

Get started for FREE with OpenObserve

Conclusion

AKS monitoring is critical for maintaining healthy and performant AKS clusters. By collecting and analyzing data on resource utilization, node performance, and pod health, it proactively identifies and addresses potential issues before they impact applications.

This includes vulnerability scanning, performance monitoring, and troubleshooting. Additionally, monitoring helps optimize resource allocation, simplify troubleshooting, and enhance security compliance.

OpenObserve is an open-source observability platform that simplifies AKS monitoring of logs, metrics, and traces. It offers a user-friendly interface, significantly lower storage costs, and superior scalability compared to other solutions.

OpenObserve integrates seamlessly with AKS for data ingestion, advanced querying of logs and metrics, and effective data visualization through intuitive dashboards. With OpenObserve, you gain comprehensive visibility into your AKS clusters while keeping costs and complexity down.

Get started for FREE with OpenObserve

Resources & Bibliography

https://openobserve.ai/blog/openobserve-on-azure-aks

https://www.kubecost.com/kubernetes-monitoring/aks-monitoring/

https://blog.nashtechglobal.com/monitoring-for-azure-kubernetes-service/

https://learn.microsoft.com/en-us/azure/aks/monitor-aks

https://intercept.cloud/en-gb/knowledge-base/blogs/aks-monitoring-and-management

YouTube Reference Videos

Monitoring Azure Kubernetes Service (AKS) with Azure Monitor

Azure Kubernetes Services (AKS) Overview

How to Create AKS Cluster in Azure

Introduction To Azure Kubernetes Service (AKS)

Author:

authorImage

The OpenObserve Team comprises dedicated professionals committed to revolutionizing system observability through their innovative platform, OpenObserve. Dedicated to streamlining data observation and system monitoring, offering high performance and cost-effective solutions for diverse use cases.

OpenObserve Inc. © 2024