Modernisation Platform Alarms
Introduction
A suite of security alarms are configured in the modernisation-platform-secure-baselines module which are recommended as part of the CIS AWS Foundations Benchmark v1.2.0. We have adapted these slightly for our purposes, for instance we have tweaked some of the metric filter patterns to ignore changes made by known automation roles and we’ve also created new alarms such as monitoring the use of the AdministratorAccess SSO role.
We also have some additional alarms we use to monitor various services that underpin the Modernisation Platform services.
All of these CloudWatch alarms are configured to hit various SNS topics which in turn send to PagerDuty HTTPS service endpoints and raise incidents. These PagerDuty services are associated with a number of different Slack Channels. In a few cases we have bypassed using PagerDuty and instead use the AWS Chatbot service to send alerts directly from SNS topics to Slack.
Baseline Security Alarms Overview
The following table provides an overview of the configuration of these alarms.
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| unauthorised-api-calls | Monitors for unauthorised API calls. Filters out known automation roles and CortexXDRCloudApp. (CIS 3.1) |
≥10 / 180s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| sign-in-without-mfa | Monitors for AWS Console sign-in without MFA by IAM users. (CIS 3.2) | ≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| root-account-usage | Monitors for root account usage. (CIS 3.3 / 1.1) | ≥1 / 300s | All accounts | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| iam-policy-changes | Monitors for IAM policy changes (attach/detach/create/delete) made outside of approved automation roles. (CIS 3.4) | ≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| cloudtrail-configuration-changes | Monitors for CloudTrail configuration changes (create/update/delete/start/stop logging). (CIS 3.5) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| sign-in-failures | Monitors for AWS Console sign-in authentication failures. (CIS 3.6) | ≥5 / 60s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| cmk-removal | Monitors for KMS customer-managed key (CMK) disable or scheduled deletion. (CIS 3.7) | ≥1 / 300s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| s3-bucket-policy-changes | Monitors for S3 bucket policy, ACL, lifecycle, CORS, and replication changes made outside of approved automation roles. (CIS 3.8) | ≥1 / 300s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| config-configuration-changes | Monitors for AWS Config recorder and delivery channel changes made outside of approved automation roles. (CIS 3.9) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| security-group-changes | Monitors for EC2 Security Group rule and lifecycle changes made outside of approved automation roles. (CIS 3.10) | ≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| nacl-changes | Monitors for Network ACL create, delete, and entry changes made outside of approved automation roles. (CIS 3.11) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| network-gateway-changes | Monitors for internet and customer gateway create, attach, and delete changes made outside of approved automation roles. (CIS 3.12) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| route-table-changes | Monitors for route table create, replace, delete, and association changes made outside of approved automation roles. (CIS 3.13) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| vpc-changes | Monitors for VPC create/delete/modify and peering connection changes made outside of approved automation roles. (CIS 3.14) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| PrivateLink-NewFlowCount-AllEndpoints | Monitors the total sum of new flows across all VPC endpoints. | >100 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-ActiveFlowCount-AllEndpoints | Monitors the total average of active flows across all VPC endpoints. | >1000 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-Service-NewConnectionCount-AllServices | Monitors the total sum of new connections across all VPC Endpoint Services. | >100 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-Service-ActiveConnectionCount-AllServices | Monitors the total average of active connections across all VPC Endpoint Services. | >1000 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| admin-role-usage | Monitors for use of the AdministratorAccess SSO role by modernisation-platform-engineers GitHub team. |
≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| orgaccess-role-usage | Monitors for human assumption of the OrganizationAccountAccessRole by justice.gov.uk users. |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
* All accounts except suppressed accounts — alerting is disabled for these alarms deployed into member-unrestricted or sandbox accounts.
Additional Monitoring Alarms Overview
The following table provides an overview of additional monitoring alarms that are configured for the platform.
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| nat_packets_drop_count | Monitors packet drop count per NAT gateway. One alarm per gateway across all environments. | >100 / 60s (5 eval periods) | core-network-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| nat_gateway_error_port_allocation | Detects when a NAT Gateway is unable to allocate ports to new connections. | >0 / 300s | core-network-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| modernisation-platform-access-trust-policy-changed | Fires on any change to the trust relationship (assume role policy) of the ModernisationPlatformAccess role. | ≥1 / 60s | core-network-services | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| unauthorized-tgw-change | Fires when any Transit Gateway change is made outside of the ModernisationPlatformAccess automation role. | ≥1 / 60s | core-network-services | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| subnet-utilisation (Lambda notification) | Lambda runs daily and publishes subnet IP utilisation metrics for all core-vpc accounts, alerting when utilisation is high. Not a CloudWatch alarm — Lambda publishes directly to SNS. | N/A — Lambda-driven (daily 10:00 UTC) | core-logging | modernisation-platform-low-priority-alarms | N/A (AWS Chatbot) | subnet-utilisation-alerts |
| <bucket>-ApproximateAgeOfOldestMessage | Monitors age of oldest message in each Cortex XDR logging SQS queue. One alarm per Cortex logging bucket. Also posted to xsiam-alerts. | ≥3600s / 300s | core-logging | modernisation-platform-low-priority-alarms | N/A (AWS Chatbot) | cortex-sqs-sns topics |
| r53-dns-firewall-matches | Monitors Route53 DNS Firewall for BLOCK or ALERT rule matches, indicating potentially malicious DNS queries. | ≥1 / 60s | core-logging | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | r53-dns-firewall-sns-topic |
| instance-scheduler-run-with-errors | Monitors the instance-scheduler Lambda for failed invocations (at least 1 error during execution). | ≥1 / 300s | core-shared-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | instance-scheduler-on-failure |
| instance-scheduler-was-throttled | Monitors the instance-scheduler Lambda for throttled invocations (Lambda fails to be invoked). | ≥1 / 300s | core-shared-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | instance-scheduler-on-failure |
Reducing False Positives
Over time we have made changes to reduce the amount of false positives to make these security alerts more meaningful. This is an evolving process but some of the ways we have achieved this are:
Changing the thresholds
We’ve increased the threshold of alarms such as unauthorised-api-calls from ≥1 / 180s to ≥10 / 180s
Filtering based on the user identity associated with the cloudtrail event
For some alarms that had quite a broad list of events e.g. security group/iam policy changes etc. we have filtered out any occurrences where the action was performed by a known IAM automation role e.g. ModernisationPlatformAccess and MemberInfrastructureAccess
Limiting the accounts the alarms are deployed into
For certain alarms such as s3-bucket-policy-changes we have decided to only deploy alarms into the “MP-owned” set of accounts as we cannot be expected to manage this on behalf of application infrastructure teams who know best how to configure the bucket policies for their environments.
Links
Slack
modernisation-platform-low-priority-alarms
modernisation-platform-high-priority-alarms
PagerDuty
Core Alerts - Modernisation Platform
High Priority Alerts - Modernisation Platform