Understanding CloudWatch Metrics for AWS Lambda and API Gateway

In serverless architectures, particularly those built with AWS Lambda and API Gateway, robust monitoring is crucial. Amazon CloudWatch provides the essential tools to track the performance, availability, and operational health of these services. This module will delve into the key metrics you should monitor for both Lambda functions and API Gateway APIs.

CloudWatch Metrics for AWS Lambda

AWS Lambda functions generate a wealth of metrics that help you understand their execution. Monitoring these metrics allows you to identify performance bottlenecks, errors, and cost-saving opportunities.

Lambda's core metrics provide insights into invocation, duration, and errors.

Key Lambda metrics include Invocations (how often your function runs), Duration (how long it takes to run), and Errors (how many invocations failed).

Invocations: This metric counts the number of times your Lambda function is invoked. High invocation counts might indicate increased traffic or potential issues if unexpected. Duration: This measures the execution time of your Lambda function in milliseconds. Monitoring the average and maximum duration helps identify performance regressions or functions that are taking too long to complete, impacting cost and user experience. Errors: This metric counts the number of invocations that resulted in an error. This is critical for understanding the reliability of your function. Errors can be synchronous (returned directly by your function code) or asynchronous (from the Lambda service itself).

What are the three most fundamental CloudWatch metrics for AWS Lambda?

Invocations, Duration, and Errors.

Beyond these core metrics, other important ones include Throttles (invocations that were rejected due to concurrency limits) and ConcurrentExecutions (the number of function instances processing events simultaneously).

CloudWatch Metrics for API Gateway

API Gateway acts as the front door to your serverless applications. Monitoring its metrics is essential for understanding API traffic, latency, and the success rate of requests.

API Gateway metrics track request volume, latency, and error types.

Essential API Gateway metrics include Count (total requests), Latency (time to process requests), and various error counts like 4XXError and 5XXError.

Count: This metric represents the total number of API requests received by API Gateway. It's a primary indicator of API usage. Latency: This measures the time it takes for API Gateway to process a request and return a response, typically in milliseconds. It includes the time spent in API Gateway itself and the time taken by the integrated backend (like Lambda). 4XXError: This counts the number of client-side errors (e.g., bad requests, unauthorized access). 5XXError: This counts the number of server-side errors, which often originate from your backend integration (e.g., Lambda function errors).

What does the 'Latency' metric in API Gateway measure?

The time taken by API Gateway to process a request and return a response, including backend processing time.

Other vital metrics for API Gateway include CacheHitCount, CacheMissCount (if caching is enabled), and IntegrationLatency (the time spent by the backend integration itself).

Correlating Lambda and API Gateway Metrics

The true power of observability comes from correlating metrics across services. For instance, a spike in API Gateway's 5XXError metric often corresponds to an increase in Lambda's Errors or Duration metrics. By examining these metrics together, you can quickly pinpoint the root cause of issues in your serverless architecture.

When troubleshooting, start with API Gateway metrics to identify if the issue is client-side (4XX) or server-side (5XX). If it's a 5XX error, then dive into Lambda metrics (Errors, Duration, Throttles) to find the source of the problem.

Visualizing the flow of a request from API Gateway to Lambda and back, with key metrics highlighted at each stage. This helps understand how different metrics relate to the overall request lifecycle. For example, API Gateway 'Count' reflects requests, 'Latency' includes API Gateway processing and Lambda 'Duration', and API Gateway '5XXError' often maps to Lambda 'Errors'.

📚

Text-based content

Library pages focus on text content

Setting Up Alarms and Dashboards

To proactively manage your serverless applications, it's essential to set up CloudWatch Alarms on critical metrics. For example, an alarm could trigger if Lambda errors exceed a certain threshold or if API Gateway latency increases significantly. Creating custom CloudWatch Dashboards that display these key metrics in one place provides a consolidated view of your application's health.

Learning Resources

Amazon CloudWatch Metrics(documentation)

Official AWS documentation providing a comprehensive overview of CloudWatch metrics, including how they are collected and used.

Monitoring AWS Lambda with Amazon CloudWatch(documentation)

Detailed guide on the specific CloudWatch metrics available for AWS Lambda functions and how to interpret them.

Amazon API Gateway Metrics(documentation)

AWS documentation explaining the metrics provided by API Gateway for monitoring API performance and usage.

AWS Lambda Metrics and Alarms(blog)

A blog post from AWS that explains how to set up effective CloudWatch alarms for Lambda functions to ensure operational health.

Observing Serverless Applications with CloudWatch(video)

A YouTube video demonstrating how to use CloudWatch for observing serverless applications, covering both Lambda and API Gateway.

CloudWatch Dashboards for Serverless(blog)

This blog post guides you through creating custom CloudWatch dashboards to visualize key metrics for your serverless architectures.

Understanding Lambda Duration(documentation)

Specific guidance on understanding and troubleshooting Lambda function duration, a critical performance metric.

API Gateway Error Handling(documentation)

Information on how API Gateway handles errors and how these relate to the 4XX and 5XX error metrics.

Serverless Observability Patterns(blog)

Discusses common patterns for achieving observability in serverless applications, including logging and metrics.

AWS Lambda Concurrency(documentation)

Explains Lambda concurrency limits and how the 'Throttles' metric relates to managing concurrent executions.

CloudWatch Metrics for Lambda and API Gateway