What hardening steps should be taken for AWS infrastructure?

Check and tag resourcesCheck IAM policies and permissionsEnable logging and monitoringRun a security assessmentMitigate AWS security vulnerabilities and other security risksCheck the current state of the databasesCheck the disaster recovery (DR) procedures

What are common causes of security incidents within AWS infrastructure?

When working with AWS infrastructure, common causes of security incidents include: Mistakes related to infrastructure maintenance and support (deletion, or incorrect modification or configuration, of resources.)Mistakes when creating applications (hardcoded configurations, lack of correct error handling, etc.)Application hacks and infrastructure attacks that target inherent vulnerabilities

AWS Security Best Practices

AuthorAlex Vakulov

AWS Security Best Practices

Jan 31, 2023

Author:

Alex VakulovGuest Blogger

AWS Security - 7 Simple Rules

Security of your AWS infrastructure is ultimately up to you. As the largest cloud services provider, AWS invests heavily to ensure its cloud environment is secure. Yet, much of AWS security is still left to the customer, especially with regard to managing identities and access.

In this blog, I will share a set of seven simple rules to improve AWS security and reduce the likelihood of unwanted incidents. These rules will also get you on an effective path to better optimizing your AWS infrastructure and its maintenance processes.

When working with AWS infrastructure, common causes of security incidents include:

Mistakes related to infrastructure maintenance and support (deletion, or incorrect modification or configuration, of resources.)
Mistakes when creating applications (hardcoded configurations, lack of correct error handling, etc.)
Application hacks and infrastructure attacks that target inherent vulnerabilities

Quite often, several incidents occur simultaneously or sequentially to exploit the cloud. Most people find it challenging change something in the infrastructure/application, while simultaneously trying to fight off an active attack and restore system health.

An Example of an AWS Infrastructure Security Breach

One sunny day, at a large company, the SysOps (Systems Operator) responsible for updating the mobile applications backend accidentally deleted the main Hosted Zone in Route 53. As a result, several million installed applications lost the ability to interact with the backend. TTL (Time to Live) for DNS (Dynamic Name Service) was set to 24 hours, and the applications configuration could not be updated since it was hardcoded. This was a mic drop moment.

News about these problems quickly spread on the Internet and attracted professional hackers. In a short time, malefactors started to attack company resources based on this misconfiguration. Several attacks were ultimately successful.

One of the attacks, a classic SQL injection attack, was carried out by a famous hacker group. They immediately shared this news on a dark web forum, exposing the weakness.

Soon, someone else examined the mobile application code and found the access Key ID and secret access key of a user with administrator rights. It is important to highlight how serious this was and how vulnerable the application had become.

Fortunately, by this time, the support team was alerted. They fixed the issue and quickly eliminated the attack vector. However, the mobile application lost the ability to work with some backend services since the main AWS IAM User of this application was compromised.

Naturally, after such events, people start thinking about a strategy to mitigate these types of problems as quickly as possible, as well as to prevent similar incidents in the future.

AWS infrastructure security hardening algorithm

To proactively secure your AWS infrastructure and address potential problem areas before they become full blown issues, I propose the following AWS security hardening algorithm. This ‘algorithm’ consists of seven, simple, and relatively fast, steps:

Check and tag resources
Check IAM policies and permissions
Enable logging and monitoring
Run a security assessment
Mitigate AWS security vulnerabilities and other security risks
Check the current state of the databases
Check the disaster recovery (DR) procedures

Let’s now walk through each step.

1) Check and tag AWS resources

Some readers may wonder why resource validation and tagging come first in such a difficult situation. The fact is, many organizations still lack a solid resource tagging scheme. You can come across resources named: DO_NOT_DELETE_UNTIL_12/02.

It’s impossible to prioritize vulnerabilities, set access rights, and create the procedure for resolving incidents if you do not know what specific resources belong to, why they are needed, and how critical they are. Reading the logs also becomes extremely difficult.

Here is a simple resource tagging scheme that will work, in most cases:

Resource name format: uid_function_system.
Resource owner (department or specialist).
The environment to which this resource belongs.
The level of importance of this resource.
DR procedures applicable to a specific resource (backup, auto-scaling, replication, etc.)

Applying tags to different AWS resource groups is done differently. More information can be found here:

Elastic Beanstalk environments
Auto Scaling groups
Various resources using the AWS Management Console

Once the tagging scheme is applied and all resources are tagged, you can reverse engineer and generate the architectural diagram using various tools such as Lucidchart, Hava, Draw.io, and others.

2) Check AWS IAM policies and permissions

The importance of checking and adjusting IAM policies, permissions, and entitlements cannot be overestimated. In my breach example earlier where the first incident occurred due to the mistake of a particular engineer, resources were restored based on what he remembered. The lack of a proper access rights policy puts added pressure on processes and authorized staff. It also increases the risk of system hacks or disruption due to mistakes of maintenance personnel.

Naturally, when users with administrative rights are "hardcoded" into the application code, this also complicates the life of developers and creates added and unnecessary risk.

You can check and adjust AWS resource access policies using the following process:

Based on the results of resource tagging and, with the help of architectural diagrams, document critical resources.
For each type of critical resource, compile a list of potentially dangerous actions. Such actions may include:
- Deletion of computing resources (RDS, EC2, RedShift, ElasticCache.)
- Deletion of other resources (DynamoDB tables, S3 buckets, SNS topics, SQS queues, SES lists, VPN Gateways, Lambda functions, VPC Routes.)
- Policy modification (Bucket policies, IAM Policies, Security Groups rules, NACL rules.)
For a combination of potentially dangerous actions, separate policies that are created and tested for normal operations versus those that are never utilized. The list of resources can be limited (Condition: Tag). Additionally, you can enforce the need to enter an MFA code to perform potentially dangerous actions that are not a part of a normal workflow, or that are sensitive in nature.
To perform potentially dangerous actions, special roles should always be created. Implement separation of duties and separation of privileges to help ensure no single account amasses too much power. For Users/User Groups, an Assume Role Policy is created that allows the role to be applied to the IAM user.
Once roles and policies are created and assigned, and after testing the applicability of these roles, limit access policies where you want to prohibit performing potentially dangerous actions.

More information on right-sizing AWS access can be found in these resources:

IAM roles for Amazon EC2
Secure Token Service
AWS Cognito for authenticating and authorizing mobile app users
MFA-protected API access

Streamline management of identities, access, entitlements, and auditing for AWS, Azure, and more with BeyondTrust. Learn more.

3) Enable logging and monitoring

Once all resources are known and access rights are correctly configured, go ahead to logging and monitoring. Both processes play a vital role for any SysOps professional. A substantial number of tools help ensure the correct collection of metrics, logs, and their accurate representation for analysis.

There are many posts and books devoted to this topic. Below is a brief overview of what can be done using existing logging and monitoring services natively in AWS.

Logging

AWS has many tools for collecting a wide variety of logs. First, you need to pay attention to the logging of user and application/script activities related to the AWS API. As a best practice, Amazon CloudTrail should almost always be used for this purpose.

AWS environments should keep logs for one year (this is a long time, and secure storage capacity should be provisioned for this volume if they are offloaded). The data obtained using AWS CloudTrail is excellent material for analyzing how users and applications interact with the AWS infrastructure.

Next, you can use various tools to analyze CloudTrail logs. Most often, we are talking about Cloud Watch Logs. A large number of third-party applications can also analyze CloudTrail logs. A list of these tools is provided here.

Additional valuable log sources include:

Amazon CloudWatch Logs provides basic functionality for collecting and analyzing AWS logs. However, you must use regular expressions to parse the logs, which can be cumbersome for complex queries.

Another handy feature is SNS integration and the ability to generate and send alerts when specific issues occur to help identify attacks and operational anomalies.

One more popular variant of collecting and analyzing logs is ELK (ElasticSearch \ Logstash \ Kibana), or its variations, including replacing Logstash with CloudWatch Logs and using the Amazon Elasticsearch service. Such a scheme allows you to customize the process of analyzing logs and, at the same time, not focus on resource allocation and management.

Monitoring

Amazon CloudWatch is the primary native AWS solution for monitoring CloudWatch provides many opportunities for monitoring AWS services, as well as other solutions hosted on the platform. The tool provides many predefined metrics that allow you to quickly and efficiently set up resource monitoring. In addition, CloudWatch allows users to create and use custom metrics to unify the process of monitoring applications and supporting cloud infrastructure. Of course, you can also implement third-party solutions to augment monitoring for specific use cases and hybrid applications.

4) Run an AWS security assessment

Security scanning is a crucial step that is best performed after all resources are known, documented, access rights are assigned, and all monitoring and logging tools are configured for test and production. This will allow you to accurately assess the scan results and identify specific features of applications and infrastructure that need security mitigations or remediation.

As an important footnote, when launching any security assessments, you should notify AWS of the planned actions. This is to let them know that your assessment is not an attack, but rather a legitimate security risk assessment. However, if you use tools validated by AWS and approved for scanning, you do not need to send a notification, but should log the action in your own change control to avoid alerting based on your own tools.

An AWS security assessment can consist of several basic steps:

ASV scanning (VPC and network.) You can use providers certified by the PCI Security Standards Council.
Vulnerability scanning using Amazon Inspector or third-party tools.
Penetration testing (if necessary / required by regulators, for example, due to PCI\DSS requirements).
Side-scanning vulnerability and attack assessments using a Cloud Workload Protection Platform (CWPP)

5) Mitigate AWS security vulnerabilities and other risks

Based on the AWS security scan results, it is necessary to take steps to address the identified risks, vulnerabilities, and misconfigurations. Steps for securing AWS may include the following:

Where possible, implement zero trust controls to ensure secure AWS infrastructure access.
Correct and update traffic filtering rules (Security Groups and Network Access Control Lists.)
Implement a Web Application Firewall and integrate it with your applications.
Update software and operating systems with published security patches.
Apply strong password policies / passwords used to access third-party software (databases, application servers, servlet containers, etc.)
Change authentication and authorization procedures (for example, switching to one-time tokens instead of IAM User Credentials.)
Implement privileged access management (PAM) to ensure AWS secrets and accounts are onboarded and managed and that least privilege is applied at both a broad and granular.
Provide integrity control at the level of operating systems, configuration files, third-party applications, and so on (for example, using OSSEC.)

Naturally, the steps above are only a fraction of the possible, and often, necessary controls to harden your AWS infrastructure and ensure robust security. Ideally, your organization builds information security assessment into the release cycle and regularly runs this procedure to ensure workloads and products are not vulnerable to an attack.

Implement zero trust for your AWS environment. Learn more.

6) Check the current state of the AWS databases

Most information systems store structured and semi-structured data in databases. So, one of the most important aspects of reducing the impact and likelihood of a severe incident is to provide disaster recovery at the database level. This is in addition to any hardening and security that might be considered.

For this step, you need to consider such things as Recovery Point Objective and Recovery Time Objective. In addition, you should not forget that the presence of replication and various high availability options do not eliminate the need to create backups. This is relevant not only for RDBMS, but also for various NoSQL databases.

Therefore, when dealing with the database, it is advisable to:

Check the availability of backups and procedures for their automatic creation based on the RPO and RTO.
Check the specifics of using databases, like whether important information is stored in the cache (Memcached, Redis, etc.) / search engine (ElasticSearch, Solr) in a single instance, or whether ETL procedures are launched on the master database, etc.
Check the specifics of the database installation and determine data flows (including using ETL procedures, specialized applications, etc.), determine the need / availability of implementation of HA and DR procedures for these components.
If possible, ensure the “rotation” of the database access parameters used by applications and users.
Provide database-specific monitoring and logging using native tools or a third-party SIEM.

7) Check the disaster recovery (DR) procedures

Our final AWS security recommendation is to perform a component-by-component disaster recovery check of your system. Ideally, you carry out the first iteration of such a check on a test environment, and then transfer the results to a production system.

Here are several areas to check across backups and high-availability functions:

Backups:

Infrastructure-as-a-Code configuration files (CloudFormation, Terraform, Elastic Beanstalk, etc.)
EBS snapshots and AMIs
Databases/file storages
Application and service configurations
Codebase

High Availability functions:

IP address scheme (public / private/static & IP failover.)
Internal and external DNS scheme.
Addressing scheme representation in applications + addressing scheme failover at the application level
Application behavior when invalidating caches / index rebuilding procedures
Databases failover (switching to a slave / selecting a new master / other procedures).
DNS-based routing and health checks
ELB-based routing and workspaces health checks

More resources on securing AWS Infrastructure

This blog explored a set of seven simple and quick steps to help you significantly reduce the likelihood and impact of security incidents on your AWS infrastructure and apps. Here are some other resources on AWS security and improving operational performance that you may find helpful.

AWS Well-Architected Framework (AWS website)

AWS Root vs IAM User: What to Know & When to Use Them (blog)

Using Privileged Access Workstations (PAWs) to Protect the Cloud (blog)

The Guide to Multicloud Privilege Management (guide)