Story image

Amazon CloudWatch adds custom metrics support

01 Oct 18

Amazon CloudWatch Agent now supports the ability to publish custom StatsD or collectd metrics to CloudWatch. 

Businesses can leverage these custom metrics to create alarms for triggering notifications and auto-scaling actions or save them to dashboards for quick viewing in CloudWatch. 

StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. CloudWatch Agent enables companies to publish and store custom StatsD and collectd metrics for up to 15 months in CloudWatch.

Businesses can also choose to publish these custom metrics to an account other than the resource account where the agent is collecting metrics, such as a central monitoring account.

They can get started with the CloudWatch agent by downloading directly from the AWS SSM console or via CLI from our S3 bucket for standalone installs. To learn more, please visit the CloudWatch agent user guide for StatsD and collectd. 

The CloudWatch agent is available in all AWS public regions, including AWS GovCloud. 

Collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files.

Collectd gathers metrics from various sources, for example, the operating system, applications, log files and external devices, and stores this information or makes it available over the network. 

Those statistics can be used to monitor systems, find performance bottlenecks (i.e. performance analysis) and predict future system load (i.e. capacity planning).

StatsD is a network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP or TCP and sends aggregates to one or more pluggable backend services (e.g. Graphite).

StatsD was inspired by the project (of the same name) at Flickr.

StatsD Overview: 

1. Each stat is in its own "bucket". They are not predefined anywhere. Buckets can be named anything that will translate to Graphite.

2. Each stat will have a value. How it is interpreted depends on modifiers. In general, values should be an integer.

3. After the flush interval timeout (defined by config.flushInterval, default 10 seconds), stats are aggregated and sent to an upstream backend service.