Amazon Web Services (AWS) Cloud Security Best Practices

What’s the difference between traditional IT security and cloud security?

At a top level, the fundamentals of securing cloud infrastructure are similar to the fundamentals of securing an on-premises network. For example, the NIST cybersecurity framework still applies: you are trying to identify what you need to secure, protect those assets, detect malicious activity, respond to security events, and recover afterward. However, the unique aspects of a cloud environment mean that the way in which you accomplish these functions can be quite different.

Perhaps the most obvious yet misunderstood difference is that in a cloud environment, the cloud service provider (CSP) has responsibility for some elements of securing that environment. The misunderstandings occur over exactly what the CSP is responsible for. CSPs like AWS have created shared responsibility models to help clarify things. This shared responsibility model says AWS is responsible for securing the underlying infrastructure of its cloud. That means it is responsible for things like maintaining and updating hardware, as well as providing physical security for that hardware. The customer using AWS infrastructure is responsible for securing anything they put in AWS. That means they are responsible for things like updating and patching operating systems, properly configuring the AWS services they use, and controlling who has access to those services. Learn more about securing AWS environments.

Should someone in your organization incorrectly assume that your cloud provider is taking care of some aspect of security, it could result in improperly secured cloud assets. Therefore, it’s very important that before anyone is given the ability to modify a cloud environment, they are first taught what the shared responsibility model is and which portions of security your organization is on the hook for.

Another unique aspect of a cloud environment is the ease with which new assets can be created and deployed. In an on-premises network, the IT and security teams have control over all new infrastructure that gets added. In a cloud network, new infrastructure can be instantly added by any person or system with the right credentials. This makes it far easier to modify a cloud network, but unless the right guardrails and monitoring are in place, it also increases the chances that new infrastructure isn’t configured securely and thus vulnerable.

At the same time, cloud environments change very quickly. Technologies like autoscaling and serverless computing mean that assets in a cloud network are constantly appearing and disappearing. Traditional security measures like vulnerability scanning are no longer enough because a vulnerable asset might only exist for a few minutes or hours, which means the asset won’t be picked up by a weekly or even daily scan. You might think that also means the asset doesn’t live for long enough to present a risk, but data from our Project Heisenberg global honeypot network shows that new assets are scanned by malicious sources within a few hours of being spun up.

The ease of deployment and high rate of change make it very difficult for security teams to maintain a complete picture of their cloud environment. This is made worse in hybrid environments (IT environments that include both on-premises and cloud networks) and multi-cloud environments (IT environments that include cloud networks from multiple cloud providers), where different information is stored in different systems, often protected by different security tools. In these situations, the security team has to bounce back and forth between various systems to manage their security efforts. The lack of unified data makes it difficult (if not impossible) to get an accurate sense of the organization’s overall security posture or track a malicious actor that is moving between cloud and on-premises networks.

 

How secure is AWS?

Just like an on-premises network, AWS is as secure as you make it. Above, we discussed the shared responsibility model and how AWS is responsible for security of the underlying infrastructure. The fact of the matter is AWS devotes more resources to their portion of the shared responsibility model than the majority of organizations with on-premises networks. For these organizations, moving to AWS can enhance portions of their security posture. 

However, moving to AWS or any other public cloud provider also introduces new risks. As we mentioned above, cloud environments have unique challenges. You can’t simply take your existing security tactics and apply them to the cloud. Having said that, if you understand the unique aspects of cloud security and apply best practices, AWS can be as secure as (or even more secure than) an on-premises network.

The importance of strong AWS cloud security

Strong AWS security is important for the same reasons why cybersecurity in general is important. You need to protect your organization and your customers from malicious actors. For many organizations, the importance of having strong AWS security is increasing as they move more valuable workloads and more sensitive data to the cloud.

Another reason why strong AWS security is important is because of the reputational damage that can occur from an avoidable incident. Gartner predicts that through the end of 2020, 95% of cloud security failures will be the customer’s fault. Should customers learn that an organization was compromised due to an easily avoidable error, it could shake their confidence to the point that they move their business elsewhere.

AWS cloud security best practices

One quick thing before we get started: As you can tell from their name, AWS loves acronyms. This can create some confusion on what various AWS services do, so here’s a quick breakdown:

  • S3 (Simple Storage Service) = Object Storage.
  • EC2 (Elastic Compute Cloud) instance = Virtual machine/server.
  • AMI (Amazon Machine Image) = A machine image that contains an operating system and sometimes additional software that’s run on an EC2 instance.
  • VPC (Virtual Private Cloud) = Virtual network that closely resembles the network of a traditional data center. All modern EC2 instances run inside a VPC.

We discuss additional AWS services in more depth below, but we wanted to make sure you were familiar with the basics. Now then, onto best practices:

1. Plan Ahead
Ideally, you should start thinking through how you will secure your AWS environment before you begin adopting it. If that ship has already sailed, no worries—it just might require a bit more effort to implement some best practices.

2. Embrace the cloud
When approaching a cloud environment for the first time, some security teams try to make the cloud mimic the on-premises environments they are used to protecting by doing things like prohibiting developers from making infrastructure changes. In almost all cases, the end result is the team gets relieved of responsibility for cloud security or engineers find ways to bypass the restrictions (see best practice No. 9 for why this is bad).

Security teams need to recognize that potentially risky aspects of the cloud, such as the rapid rate of change and the ease of deployment, are also some of the biggest benefits to using cloud infrastructure. To be successful, security teams must endeavor to be seen as enablers of the cloud. They must find ways to keep cloud infrastructure secure without overly stifling those aspects that make the cloud beneficial to the organization. This starts by adopting an open mind and recognizing that successfully managing risk in a cloud environment will require new tactics and processes.

3. Define a security baseline for your AWS environment
Your security and DevOps teams should work together to define what your AWS environment should look like from a security perspective. The baseline should clearly describe everything from how assets must be configured to an incident response plan. The teams should consider using resources like the AWS Well-Architected Framework and the CIS Benchmarks for AWS as starting points. They might also want to ask for assistance from an AWS Solutions Architect, who is a technical expert skilled in helping customers construct their AWS environment.

Make sure your baseline is applied to your production environment as well as any test and pre-production environments. Reevaluate your baseline at least every six months to incorporate things like new threats and changes in your environment.

4. Enforce your baseline
Once your security and DevOps teams have defined what your AWS security baseline looks like, you need to enforce it. Make it easy for developers to adhere to your baseline by providing them with infrastructure templates that have already been properly configured. You can do this using AWS CloudFormation or an infrastructure as code vendor like Terraform

You also need a monitoring solution in place to detect when something is out of compliance with the baseline (either because it was deployed with a misconfiguration or because a change was made after deployment). To do this, one option is to use AWS Security Hub, but several third-party vulnerability management solutions include built-in monitoring for cloud misconfigurations. There are two benefits to using a VM solution with built-in misconfiguration detection. First, it consolidates two types of risk monitoring (asset vulnerabilities and cloud infrastructure misconfigurations) into one tool. Second, with most of the vulnerability management solutions all the misconfiguration rules and detections are managed for you by the vendor, whereas with AWS Security Hub you need to set up and manage the rules yourself. Learn more about InsightVM for vulnerability assessment + cloud configuration.

Another option for enforcing your security baseline is a Cloud Security Posture Management (CSPM) solution. A quality CSPM will have the ability to monitor accounts from multiple cloud providers for misconfigurations. This is a big deal, as it allows your organization to set one security baseline for all your cloud providers, then enforce it using a single tool. Beyond being able to monitor cloud accounts for misconfigurations, you should look for a CSPM with the ability to automatically fix misconfigurations as soon as they are detected. This will greatly reduce the burden on your security team and ensure that nothing slips through the cracks. 

Other capabilities to look for in a CSPM include the ability to flag issues in infrastructure as code before anything is deployed, IAM governance (see the next section for more on IAM), and compliance auditing. CSPMs tend to be a bit pricey, but for organizations that use multiple cloud providers or who have a large number of accounts with a single provider, a CSPM is the way to turn the chaos of managing all those accounts into order.

5. Limit access
Few things are more important to creating a secure AWS environment than restricting access to just those users and systems that need it. This is accomplished using AWS Identity Access Management (IAM). IAM consists of the following components:

  • Users: These represent individual people or systems that need to interact with AWS. A user consists of a name and credentials.
  • Credentials: The ways that a user can access AWS. Credentials include console passwords, access keys, SSH keys, and server certificates.
  • Groups: A collection of users. With groups, you can manage permissions for all users in the group at once, rather than having to change the permissions for each user individually.
  • Roles: These are similar to users, but don’t have long-term credentials like a password or access key. A role can be assumed by a user or service. When a role is assumed, it provides temporary credentials for the session. Only users, roles, accounts, and services that you specify can assume a role. Roles let you do things like give a user access to multiple AWS accounts or give an application access to AWS services without having to store long-term credentials inside the app.
  • Policies: These are JSON docs that give permission to perform an action or actions in specific AWS services. In order to give a user, group, or role the ability to do something in AWS, you have to attach a policy. AWS provides several hundred predefined “AWS Managed Policies” to choose from, or you can build your own.

Now that you have a basic understanding of the components that make up IAM, let’s talk about best practices. AWS has a list of IAM best practices that you should read through. Similar practices are mentioned in the CIS Benchmarks for AWS. All of these best practices are important, but in the interest of brevity, we’ll call out a few of the most vital (and commonly broken) guidelines:

  • Don’t use the root user: The root user is the user that is associated with the email address used to create an AWS account. The root user can do things even a full admin cannot. If a malicious actor gets their hands on root user credentials, massive damage can be done. Make sure you use a very complex password on your root user, enable MFA (ideally using hardware MFA), and lock away the MFA device in a safe. Yes, literally lock the MFA device away. You should also delete any access keys that have been created for the root user. Only use the root user in those very rare circumstances where it’s required.
  • Manage users through federated SSO: It’s a security best practice to use federated SSO to manage employee access to resources, and that includes AWS. You should take advantage of IAM’s Identity Provider functionality so that you can centrally manage individual access to AWS through your existing SSO solution.
  • Don’t attach policies to individual users: Instead, apply them to groups and roles. This makes it far easier to maintain visibility into who can access what and minimizes the chances that an individual passes under the radar with access to more than what they need.
  • Require a strong password: You should configure IAM to require a strong password. CIS recommends you set IAM to require a password at least 14 characters with at least one uppercase and lowercase character, one number, and one symbol. CIS also recommends that passwords expire at least every 90 days and that previous passwords cannot be reused.
  • Require MFA: Along with a strong password, you should ensure that all users have enabled MFA.
  • Delete unused credentials: IAM can generate a credentials report that shows when credentials for each user were last used. You should regularly go into this report and disable or delete credentials that haven’t been used in the past 90 days.
  • Regularly rotate access keys: In many cases, you can (and should) use IAM roles instead of access keys for programmatic access to AWS. In those situations where you have to use access keys, you should make sure they are rotated at least every 90 days. The IAM credentials report shows when access keys were last rotated. Use this report to ensure any overdue access keys are changed.

6. Watch for vulnerabilities
A lot of people don’t realize that even in the cloud, unpatched vulnerabilities still present a threat. To detect vulnerabilities in EC2 instances, you can use AWS Inspector or a third-party vulnerability management solution. Using a vulnerability management solution allows you to better prioritize your work, improve your reporting capabilities, and facilitate communication with infrastructure owners and help everyone monitor progress towards reducing risk. In addition, security teams that are dealing with a hybrid or multi-cloud environment often prefer to use a third-party solution because it allows them to oversee vulnerability and risk management for all their environments in one place (more on that in best practice item No. 9).

Although vulnerability management should be familiar to most cybersecurity professionals, there are a few unique aspects of VM in a cloud environment like AWS that you should be aware of. As we mentioned earlier, a cloud environment can quickly change. Assets appear and disappear minute-by-minute. In such a dynamic world, weekly or even daily scans aren’t enough to get an accurate understanding of vulnerabilities and your risk exposure. It’s important to have some way to make sure you have a complete picture of which EC2 instances exist, as well as a way to continuously monitor the instances throughout their lifetime. To ensure you have a complete picture of your EC2 instances, invest in a vulnerability management solution with dynamic asset discovery, which automatically detects new instances as they are deployed. A similar capability can be achieved with AWS Inspector by using CloudWatch Events, although setup is a little more manual.

When vulnerabilities are detected in an EC2 instance, they can be addressed in several ways. One option is to use the Patch Manager in AWS Systems Manager. This approach is the most similar to how you traditionally manage vulnerabilities in an on-premises network. However, many cloud environments are designed to be immutable. In other words, assets like EC2 instances should not be changed once they’re deployed. Instead, when a change needs to be made, the existing asset is terminated and replaced with a new one that incorporates the change. 

So, in immutable environments, you don’t deploy patches, but rather deploy new instances that include the patches. One way to do this is to create and maintain a base AMI that gets regularly updated to run the most recent version of whatever operating systems you’re using. With this approach, when a vulnerability is detected, you can create a new baseline AMI that incorporates patches for the vulnerability. This will eliminate the vulnerability from any future EC2 instance you deploy, but you’ll need to make sure you also redeploy any currently running EC2 instances. 

Another option is to use an infrastructure automation tool like Chef or Puppet to update and redeploy AMIs. This approach makes sense if you are already using one of these tools to maintain your EC2 instances.

7. Collect and protect logs
Just like any other system, you should log all activity that has occurred in your AWS environment. Not only are logs important for monitoring and compliance, but thinking back to the NIST cybersecurity framework, they are a critical part of detecting malicious activity, (especially when they are fed into a modern SIEM) responding to a security event, and recovering afterward.

In AWS, most logs are captured using CloudTrail. This service automatically captures and stores AWS API activity as what AWS calls Management Events in your AWS account for no charge (although you will need to pay the cost of storage). CloudTrail captures tens of thousands of events, including critical security information like logins and configuration changes to AWS services. For a fee, you can also create “trails” in CloudTrail, which allows you to do things like capture additional activity and send your logs to S3 for long-term storage and/or export. Here are some best practices for setting up CloudTrail in your AWS account:

  • Create a trail for all regions: Although it costs money, you should create a trail in CloudTrail so you can send all your logs to an S3 bucket. This will allow you to store your logs indefinitely (CIS recommends keeping them for at least 365 days). When creating your trail, you should make sure the option Apply trail to all regions is enabled. This will allow your trail to show you activity from every AWS region. If you don’t enable this option, your trail will only collect logs for activity occurring in whatever AWS region you are using when you create the trail. It’s important to capture data from all regions so that you have visibility in case something suspicious happens in a region you don’t normally use. If you use multiple AWS accounts, you also might want to use one bucket to store logs for all your accounts.
  • Protect the S3 bucket holding your logs: Since your logs are a key part of detecting and remediating an incident, the S3 bucket where you store your logs is a prime target for an attacker. Therefore, you should make sure you do everything possible to protect it. Make sure the bucket isn’t publicly accessible and restrict access to only those users who absolutely need it. Log all access to the bucket and make sure this S3 log bucket is only accessible by users who can’t access the CloudTrail log bucket. You should also consider requiring MFA in order to delete your log buckets.
  • Encrypt log files with SSE-KMS: Although CloudTrail logs are encrypted by default, you can add an additional level of defense enabling server-side encryption with AWS KMS. With this option, a user will not only need permission to access the S3 bucket holding your log files, but they will also need access to a customer master key (CMK) to decrypt said files. It’s a great way to ensure only a select few can access your logs. When you create your CMK, make sure you also enable automatic key rotation.
  • Use log validation: CloudTrail can automatically create validation files that are used to detect if a CloudTrail log has been tampered with. Since manipulating log files is a great way for an attacker to cover their tracks, you should make sure log validation is enabled for your trail.

Although most logs are collected in CloudTrail, there are a few other logs you should make sure you capture. VPC Flow Logs show data on the IP traffic going to and from the network interfaces in your virtual private cloud (VPC). They can help you identify intra-VPC port scanning, network traffic anomalies, and known malicious IP addresses. If you use AWS Route 53 as your DNS, you should also log DNS queries. You can use these logs to match against threat intelligence and identify known-bad or quickly spreading threats. Keep in mind that you will need to use AWS CloudWatch to view your DNS logs.

8. Monitor, detect, and react
Now that you know how to use logs to obtain visibility into the activity in your AWS environment, the next question is how to leverage this visibility. One (very manual) option is to use AWS CloudWatch alarms. With this approach, you build alarms for various suspicious actions such as unauthorized API calls, VPC changes, etc. A list of recommended alarms is included in the CIS Benchmarks for AWS. The challenge with this approach is that each alarm must be manually built and maintained.

Another option is to use AWS GuardDuty. GuardDuty uses CloudTrail, VPC Flow Logs, and DNS logs to detect and alert on suspicious behavior. The nice thing about GuardDuty is that it is powered by an AWS-managed list of findings (aka potential security issues), as well as machine learning. That means no manual setup or maintenance is needed to receive alerts about suspicious activity. However, detecting suspicious activity is just the first step in responding to an incident. Your security team will need to pull relevant log files and other data to verify that an incident has occurred, then determine the best way to respond and recover. If the team needs to search multiple different data sources to find this information, it can dramatically lengthen the time needed to conduct an investigation. This challenge is exacerbated in a hybrid or multi-cloud environment.

Having all relevant data automatically centralized during an investigation is just one of the reasons why many security teams decide to use a modern SIEM and incident detection tool. A good SIEM solution will have a CloudTrail integration and let you store all logs from AWS alongside logs from on-prem networks and other cloud providers like Azure and Google Cloud Platform (GCP). This ability to centralize all data can be massively helpful in speeding up investigations, especially when you need to track a malicious actor who has moved across your environments. 

A good SIEM will also provide a host of other features to enhance your security team’s ability to detect, confirm, and respond to an attack. For example, the most advanced SIEMs use multiple techniques to detect suspicious behavior. Other features to look for include the ability to create custom alerts, deception technology (things like pre-built honeypots and honey users that will trigger alerts when accessed) and File Integrity Monitoring (FIM). All these capabilities provide additional layers of detection. You should also look for a SIEM that provides visualization capabilities like customizable dashboards and investigation timelines, which make your centralized data more usable. In addition, make sure any SIEM you’re considering has built-in automation, as this can dramatically reduce reaction times when an incident occurs. Finally, many teams like to use both AWS and third-party tools to secure their AWS environment, in which case it’s important to find a SIEM that includes a GuardDuty integration.

9. Unify AWS with on-premises and other cloud security
One very common mistake is to approach AWS security in a silo, separate from efforts to secure existing IT infrastructure. This creates hole that can be exploited by a malicious actor. For example, we’ve seen situations where an organization’s on-premises and AWS security were designed to address different potential threats. The resulting gaps left both networks vulnerable.

Having a single team responsible for securing all IT infrastructure ensures that no assumptions are made about what the “other” security team is or isn’t doing. Instead, there is one team that knows it is accountable for all aspects of your organization’s cybersecurity posture. Unifying your security efforts under one team can also be extremely important during an incident. The team has immediate access to far more data. It’s also much easier to maintain clarity around each team member’s area of responsibility.

Not only is it important to unify responsibility for security under one team, but it’s important to unify all your security data in one set of tools. The vast majority of organizations are not just using AWS. At a minimum they have on-premises networks and employee endpoints to secure. In many cases, organizations also utilize multiple cloud providers. If you use different security solutions for each environment, it increases the likelihood of there being blind spots. In addition, the more tools your security team uses, the higher their workload, as they are forced to constantly bounce between tools in order to manually piece together a complete picture of the organization’s current cybersecurity posture.

10. Automate
With so many best practices for securing AWS, it’s not reasonable to expect everyone to remember them all. Even if they did, mistakes happen. To ensure your AWS environment continuously adheres to your security baseline, you should turn to automation. For example, you can use a combination of CloudFormation and Lambda or a tool like Terraform, or one of the more advanced CSPMs to automate deployment of new AWS infrastructure and ensure that everything complies with the baseline you’ve established. You can also have these tools automatically flag or terminate infrastructure that is not in compliance.

Another benefit of using automation is the capacity it frees up within your security team. The ongoing shortage of security professionals means teams are overtaxed. That issue is only exacerbated when an organization starts migrating to the cloud, which dramatically expands the infrastructure footprint the team has to secure. To give your team a fighting chance, consider turning to a Security Orchestration, Automation, and Response (SOAR) tool. SOAR can allow you to easily pass data between on-premises and cloud services, facilitating a unified view of the organization’s entire IT infrastructure. A SOAR can also alleviate busy-work like onboarding and offboarding, as well as labor-intensive processes like aggregating data during the initial stages of an investigation. Using a SOAR reduces the workload on the security team and helps them work more efficiently. With a SOAR, your security team has more time to focus on high-value work and reduce the time it takes to investigate incidents.