Modernisation Platform Alarms
Introduction
A suite of security alarms are configured in the modernisation-platform-secure-baselines module which are recommended as part of the CIS AWS Foundations Benchmark v1.2.0. We have adapted these slightly for our purposes, for instance we have tweaked some of the metric filter patterns to ignore changes made by known automation roles and we’ve also created new alarms such as monitoring the use of the AdministratorAccess SSO role. The module is split by concern across several files: general_alerts.tf, cloudtrail_alerts.tf, config_alerts.tf, iam_alerts.tf, network_alerts.tf, and securityhub_alerts.tf.
We also have some additional alarms we use to monitor various services that underpin the Modernisation Platform services.
All of these CloudWatch alarms are configured to hit various SNS topics which in turn send to PagerDuty HTTPS service endpoints and raise incidents. These PagerDuty services are associated with a number of different Slack Channels. In a few cases we have bypassed using PagerDuty and instead use the AWS Chatbot service to send alerts directly from SNS topics to Slack.
Baseline Security Alarms Overview
The following tables provide an overview of the configuration of these alarms, grouped by the Terraform source file they are defined in.
* All accounts except suppressed accounts — alerting is disabled for these alarms deployed into member-unrestricted or sandbox accounts.
General Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| unauthorised-api-calls | Monitors for unauthorised API calls. Filters out known automation roles and CortexXDRCloudApp. (CIS 3.1) |
≥10 / 180s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| sign-in-without-mfa | Monitors for AWS Console sign-in without MFA by IAM users. (CIS 3.2) | ≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| sign-in-failures | Monitors for AWS Console sign-in authentication failures. (CIS 3.6) | ≥5 / 60s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| cmk-removal | Monitors for KMS customer-managed key (CMK) disable or scheduled deletion. (CIS 3.7) | ≥1 / 300s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| s3-bucket-policy-changes | Monitors for S3 bucket policy, ACL, lifecycle, CORS, and replication changes made outside of approved automation roles. (CIS 3.8) | ≥1 / 300s | MP-owned workspaces only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| disable-alarms-actions-events | Monitors for CloudWatch alarm actions being disabled (DisableAlarmActions) outside of approved automation roles. |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| secrets-manager-events-core-account-non-mp-team | Monitors for Secrets Manager write events in MP core accounts not performed by the modernisation-platform-engineers team or approved automation roles. |
≥1 / 300s | MP core accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| s3-object-deletions-excluding-tf-lock-files | Monitors for S3 object deletions (excluding Terraform state lock files) in MP core accounts. | ≥1 / 300s | MP core accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| ec2-termination-in-core-shared-services | Monitors for termination of EC2 instances in core-shared-services outside of approved automation roles. |
≥1 / 300s | core-shared-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
CloudTrail Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| cloudtrail-configuration-changes | Monitors for CloudTrail configuration changes (create/update/delete/start/stop logging). (CIS 3.5) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
AWS Config Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| config-configuration-changes | Monitors for AWS Config recorder and delivery channel changes made outside of approved automation roles. (CIS 3.9) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
IAM & Privileged Access Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| root-account-usage | Monitors for root account usage. (CIS 3.3 / 1.1) | ≥1 / 300s | All accounts | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| iam-policy-changes | Monitors for IAM policy changes (attach/detach/create/delete) made outside of approved automation roles. (CIS 3.4) | ≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| critical-role-trust-relationship-changes | Monitors for trust relationship changes to MemberInfrastructureAccess or ModernisationPlatformAccess made outside of approved automation roles. |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| admin-role-usage | Monitors for any use of the AdministratorAccess SSO role. |
≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| admin-role-usage-non-mp-team | Monitors for use of the AdministratorAccess SSO role by principals outside the modernisation-platform-engineers team (computed via metric math: all usage minus MP team usage). |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| admin-role-usage-all-usage-outside-on-call-hours | Monitors for use of the AdministratorAccess SSO role outside of core business and on-call hours (22:00–06:59 UTC). |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| orgaccess-role-usage | Monitors for human assumption of the OrganizationAccountAccessRole by justice.gov.uk users. |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| iam-user-deletion-by-untrusted-role | Monitors for deletion of IAM users performed outside of approved automation roles. | ≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| modernisation-platform-superadmin-role-usage | Monitors for use of the SuperAdmin role in the modernisation-platform account. |
≥1 / 300s | MP account only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| modernisation-platform-superadmin-user-deletion | Monitors for manual deletion of IAM users with the -superadmin suffix in the modernisation-platform account outside of approved automation roles. |
≥1 / 300s | MP account only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| modernisation-platform-superadmin-user-access-key-creation | Monitors for creation of access keys for IAM users with the -superadmin suffix in the modernisation-platform account outside of approved automation roles. |
≥1 / 300s | MP account only | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
Network Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| security-group-changes | Monitors for EC2 Security Group rule and lifecycle changes made outside of approved automation roles. (CIS 3.10) | ≥1 / 300s | All accounts* | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| nacl-changes | Monitors for Network ACL create, delete, and entry changes made outside of approved automation roles. (CIS 3.11) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| network-gateway-changes | Monitors for internet and customer gateway create, attach, and delete changes made outside of approved automation roles. (CIS 3.12) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| route-table-changes | Monitors for route table create, replace, delete, and association changes made outside of approved automation roles. (CIS 3.13) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| vpc-changes | Monitors for VPC create/delete/modify and peering connection changes made outside of approved automation roles. (CIS 3.14) | ≥1 / 300s | All accounts* | modernisation-platform-high-priority-alarms | High Priority Alerts - Modernisation Platform | high-priority-alarms-topic |
| transit-gateway-changes | Monitors for Transit Gateway changes made outside of approved automation roles. | ≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| vpn-changes | Monitors for VPN connection, gateway, and customer gateway changes made outside of approved automation roles. | ≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| network-firewall-changes | Monitors for changes to Network Firewalls in core-network-services made outside of approved automation roles. |
≥1 / 300s | core-network-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-NewFlowCount-AllEndpoints | Monitors the total sum of new flows across all VPC endpoints. | >100 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-ActiveFlowCount-AllEndpoints | Monitors the total average of active flows across all VPC endpoints. | >1000 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-Service-NewConnectionCount-AllServices | Monitors the total sum of new connections across all VPC Endpoint Services. | >100 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| PrivateLink-Service-ActiveConnectionCount-AllServices | Monitors the total average of active connections across all VPC Endpoint Services. | >1000 / 60s (3 eval periods) | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
Security Hub & GuardDuty Alerts
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| securityhub-events-alerting | Monitors for Security Hub being disabled (DisableSecurityHub) and GuardDuty being disabled or materially changed (DeleteDetector, UpdateDetector) outside of automation. |
≥1 / 300s | All accounts | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
Additional Monitoring Alarms Overview
The following table provides an overview of additional monitoring alarms that are configured for the platform outside of the baselines module.
| Alarm Name | Description | Threshold / Period | Account Scope | Slack Channel | PagerDuty Service | SNS Topic |
|---|---|---|---|---|---|---|
| nat_packets_drop_count | Monitors packet drop count per NAT gateway. One alarm per gateway across all environments. | >100 / 60s (5 eval periods) | core-network-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| nat_gateway_error_port_allocation | Detects when a NAT Gateway is unable to allocate ports to new connections. | >0 / 300s | core-network-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | securityhub-alarms |
| subnet-utilisation (Lambda notification) | Lambda runs daily and publishes subnet IP utilisation metrics for all core-vpc accounts, alerting when utilisation is high. Not a CloudWatch alarm — Lambda publishes directly to SNS. | N/A — Lambda-driven (daily 10:00 UTC) | core-logging | modernisation-platform-low-priority-alarms | N/A (AWS Chatbot) | subnet-utilisation-alerts |
| <bucket>-ApproximateAgeOfOldestMessage | Monitors age of oldest message in each Cortex XDR logging SQS queue. One alarm per Cortex logging bucket. Also posted to xsiam-alerts. | ≥3600s / 300s | core-logging | modernisation-platform-low-priority-alarms | N/A (AWS Chatbot) | cortex-sqs-sns topics |
| r53-dns-firewall-matches | Monitors Route53 DNS Firewall for BLOCK or ALERT rule matches, indicating potentially malicious DNS queries. | ≥1 / 60s | core-logging | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | r53-dns-firewall-sns-topic |
| instance-scheduler-run-with-errors | Monitors the instance-scheduler Lambda for failed invocations (at least 1 error during execution). | ≥1 / 300s | core-shared-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | instance-scheduler-on-failure |
| instance-scheduler-was-throttled | Monitors the instance-scheduler Lambda for throttled invocations (Lambda fails to be invoked). | ≥1 / 300s | core-shared-services | modernisation-platform-low-priority-alarms | Core Alerts - Modernisation Platform | instance-scheduler-on-failure |
Reducing False Positives
Over time we have made changes to reduce the amount of false positives to make these security alerts more meaningful. This is an evolving process but some of the ways we have achieved this are:
Changing the thresholds
We’ve increased the threshold of alarms such as unauthorised-api-calls from ≥1 / 180s to ≥10 / 180s
Filtering based on the user identity associated with the cloudtrail event
For some alarms that had quite a broad list of events e.g. security group/iam policy changes etc. we have filtered out any occurrences where the action was performed by a known IAM automation role e.g. ModernisationPlatformAccess and MemberInfrastructureAccess
Limiting the accounts the alarms are deployed into
For certain alarms such as s3-bucket-policy-changes we have decided to only deploy alarms into the “MP-owned” set of accounts as we cannot be expected to manage this on behalf of application infrastructure teams who know best how to configure the bucket policies for their environments.
Links
Slack
modernisation-platform-low-priority-alarms
modernisation-platform-high-priority-alarms
PagerDuty
Core Alerts - Modernisation Platform
High Priority Alerts - Modernisation Platform