CloudWatch networking alarms
A summary of current alarms
- nacl-changes
- network-gateway-changes
- NoVPCAttachmentTraffic
- route-table-changes
- vpc-changes
nacl-changes
This alarm triggers when an API call is made to create, delete, or update a network access control list (NACL). Our NACLs are fairly static, but can be amended when customers feed in values for east/west traffic, or private access to other locations such as PSN endpoints.
Should this alarm trigger you should confirm the NACL in question and find the cause of the alarm. You should check our Github Actions to see if any recent actions correspond to the triggered alarms. You can also investigate the CloudTrail logging for the relevant account.
As we make all durable changes through code, this should show you the cause of the nacl-changes
alert. It ought to be
legitimate in all cases, but can sometimes be related to alarms such as NoVPCAttachmentTraffic
where a NACL blocks
traffic from exiting a VPC.
It is unlikely to cause traffic to be blocked from entering a VPC as the transit gateway endpoint subnets have fixed permissive NACLs.
Outputs to #modernisation-platform-low-priority-alarms.
network-gateway-changes
This alarm triggers when an API call is made to create, delete, or attach an Internet Gateway or Customer Gateway. As with how static our VPC configuration is, it is unlikely that this alarm will trigger unintentionally.
Should this alarm trigger you should confirm the gateway in question and find the cause of the alarm. You should check our Github Actions to see if any recent actions correspond to the triggered alarms. You can also investigate the CloudTrail logging for the relevant account.
As we make all durable changes through code, this should show you the cause of the network-gateway-changes
alert. As
with the nacl-changes
it ought to be legitimate in all cases. It is likely to affect customer traffic coming into a
service from the internet in the case of an Internet Gateway. In the case of a Customer Gateway this will be seen
alongside a VPN tunnel moving to a down
state.
Outputs to #modernisation-platform-low-priority-alarms.
NpVPCAttachmentTraffic
This alarm triggers when no traffic has been seen traversing a VPC Transit Gateway attachment and exists on a per-VPC basis. The intent for this alarm was to alert us when a change or event occurred that caused a cessation of traffic from a VPC to the Modernisation Platform Transit Gateway.
Should this alarm trigger you should confirm the VPC in question and assess the behaviour of traffic over the alarm period, extending past this alarm to observe what normal behaviour should look like. It is possible for this alarm to trigger during ordinary operation when the polling window is short; no traffic over five minutes is not uncommon.
In the event that this alarm triggers, and it has been deemed legitimate, your next step should be to confirm that there are no outstanding issues with the AWS VPC service, and that no changes have been made to the relevant VPC through the core-vpc Github actions.
It is possible that a change to route tables, network access control lists, or transit gateway attachment can cause a legitimate triggering of this alarm. In short, anything that is involved in traffic leaving the VPC via the Transit Gateway is in scope for investigation.
An example of an illegitimate alert would be a dip in traffic over the alarm period that resolves itself without any action from either the Modernisation Platform team, or a customer team - taking a service down for maintenance, for example, and causing a legitimate cessation of traffic could cause this.
An example of a legitimate alert would be a dip in traffic over the alarm period that does not resolve itself, and can be traced back to an unanticipated consequence of an action; restricting traffic through an access control list, for example.
Outputs to #modernisation-platform-high-priority-alarms.
route-table-changes
This alarm triggers when an API call is made to carry out a route-table related action; creating, replacing, or deleting a route table or a route will match the filter for this alarm. It is likely that this alarm will be seen in conjunction with a valid pull request.
Should this alarm trigger you should confirm the gateway in question and find the cause of the alarm. You should check our Github Actions to see if any recent actions correspond to the triggered alarms. You can also investigate the CloudTrail logging for the relevant account.
Changes to route tables will affect the flow of traffic. As routes are selected on a most-specific-route basis, it is possible that the impact of a route change will have limited effects.
Outputs to #modernisation-platform-low-priority-alarms.
vpc-changes
This alarm triggers when an API call is made to create, delete, or update a VPC. As we have existing VPCs created on a per-business-unit and per-environment basis, we do not expect to see this alarm trigger.
Should this alarm trigger you should confirm the VPC in question and find the cause of the alarm. You should check our Github Actions to see if any recent actions correspond to the triggered alarms.
As we make all durable changes through code, this should show you the cause of the network-gateway-changes
alert.
Outputs to #modernisation-platform-low-priority-alarms.
Viewing metric filters
The metrics that inform some of these alarms are based on CloudTrail filters. You can find them here.