In part one of this two-part series on the cloud and cloud security for security professionals, we dove into everything you’ve ever wanted to know about the cloud (but were afraid to ask). Now that you have a better understanding of what the cloud actually is and how it works, let’s dive into how to secure cloud infrastructure. Namely, we’re going to talk about the top security controls that should be used to help ensure your environment is set up securely.
The challenges of cloud security
First, let’s quickly cover what’s unique about securing a cloud environment and what challenges exist. As we discussed in part one of this series, the biggest benefits of cloud infrastructure are speed and ease of deployment. In a cloud environment, developers and DevOps can deploy new infrastructure with a few clicks. This greatly speeds innovation, but the downside is you have employees who are relatively unfamiliar with network configuration and security creating infrastructure.
Therefore, one of the biggest risks inherent in cloud infrastructure is misconfigurations that can be exploited to gain access to your environment. A lot of the best practices for cloud security revolve around minimizing the chances of misconfigurations occurring and quickly remediating them when they do occur.
The challenge for security professionals is you have to find a way to keep your cloud environment secure without impacting your developers’ ability to move quickly. That generally means implementing guardrails and automation, which make it easy for developers to configure assets properly without slowing them down. The importance of finding a way to stay secure without impeding the speed and ease of deployment can’t be overstated: If you put up roadblocks, developers will find ways to circumvent them, and your organization will be left far more exposed than if you find a way to work with your developers and stay secure while accommodating their needs.
Another big challenge is the rate of change found in cloud infrastructure. The ease and speed of deployment in the cloud means things change far more quickly than in an on-prem environment. In addition, a key feature used in many cloud environments is auto-scaling, where the number of servers running a workload is automatically scaled up or down based on demand. This constant change makes it very difficult for security teams to stay on top of keeping the environment secure. The answer is automation. That can mean anything from automatically failing a deployment of a misconfigured asset to automatically isolating an asset that appears to have been compromised.
Now that you have a basic understanding of the challenges that come with securing a cloud environment, let’s get to the good stuff and talk about best practices.
Creating a baseline
Before you do much else, you want to create a baseline—and then enforce it. The baseline lays out precisely what your cloud environment should look like from a security perspective, outlining things like what services are and are not authorized to be used. It also specifies how those services should be configured, who gets access to which parts of the cloud infrastructure, who can make changes to that infrastructure, and so on. The baseline serves as a document that everyone can reference. This is crucial in the cloud because it’s not just the IT and security teams running the infrastructure anymore, it’s also the DevOps team and developers themselves. The baseline ensures everyone is aligned. It should also lay out processes such as how the organization responds to incidents. An incident response plan is critical here, as it clearly lays out who is responsible for what when a security issue arises.
What you’re probably asking now is how to create the baseline in the first place. We recommend starting with existing best practices recommendations like the CIS benchmarks, which are created for AWS, Azure, and Google Cloud Platform (GCP). They provide a wide range of best practices for securely configuring cloud infrastructure. Each cloud provider also has its own best practices you can utilize. They are all a great place to begin, and from there, you can decide which recommendations do and don’t work for your organization.
Enforcing the baseline
We all know that a document or policy without enforcement is just a piece of paper. So, how do you take the baseline you create and enforce it so people actually follow it? And how do you enforce it without blocking developers who need to move quickly?
There are a few ways to do this. One is with cloud security posture management (CSPM) solutions like DivvyCloud, which help you to create and enforce the baseline. This can give you visibility into misconfigurations and policy compliance so you can remediate in a timely manner. DivvyCloud is a great way to create and enforce a baseline that works across accounts and across multiple cloud providers. Another great capability of DivvyCloud is that when something is broken or misconfigured, it can automatically fix the offending asset to comply with the baseline policies.
CSPMs are a must-have at the enterprise level. If you’re running multiple cloud accounts across multiple cloud providers, security can get complex really fast. But if you’re not at the enterprise level or can’t invest in a CSPM yet, the best approach is to use an infrastructure-as-code solution, whereby you can create templates for cloud infrastructure where everything is properly configured according to your baseline. By having your developers use those templates to create new infrastructure, not only do you make it fast and easy for a developer to deploy everything with the right configuration, you also significantly reduce the possibility of human error. The most popular infrastructure-as-code solution is Terraform because it works across multiple cloud providers. It’s important to note, however, that creating infrastructure-as-code templates alone isn’t enough, you will still need a way to monitor everything—just because your cloud infrastructure is deployed properly doesn’t mean someone can’t change it down the road. You can monitor for misconfigurations using a tool like the Cloud Configuration Assessment (CCA) feature found in our InsightVM product. Using a tool like CCA is great because you can monitor for misconfigurations using the same tool you use to monitor for other forms of risk, like software vulnerabilities. AWS, Azure, and GCP also have their own platform-specific monitoring, but the point is to use some form of monitoring.
In short, you should limit who has access to your cloud infrastructure. First, make sure users are accessing your cloud accounts using single-sign-on (SSO) tools. You don’t want cloud logins stored and managed separately from other logins because if a person leaves the company, you won’t be able to remove their access to cloud infrastructure and every other tool easily. You also want to consider assigning permissions at the group or team level, not the individual level. This ensures consistency and avoids confusion of mismatched user permissions. For example, with AWS IAM (Identity and Access Management) you can create groups for each team in the organization, giving all members of that group the same permissions. Changing permissions at the group level versus the individual level also reduces the chances that someone sneaks under the radar with access they shouldn’t have.
Another important best practice is to never use the root user, unless you absolutely have to. The root user is the user created when you establish a cloud account. It has access and permissions that no other user has, including an administrator; this user is incredibly powerful. If this user were to be compromised, some pretty serious damage can be done. To that end, make sure your root user is only used when absolutely necessary. The rest of the time, physically lock away the root user’s credentials and require multi-factor authentication (MFA) when used (and frankly, you should require MFA for all users). Most cloud platforms can create a credential report that shows you who has access to what and what those credentials were used for. These reports can help identify whether the root user is being used and if it is, in almost all situations you can create new users with more limited permissions to replace the root user.
Set up vulnerability monitoring
Even in the cloud, you’re still using virtual machines that can have software vulnerabilities. Like an on-premises network, they need to be monitored and patched. Instances can be spun up and down minute-by-minute, so just relying on a weekly scan report won’t give you enough insight into what your real risk score is in the moment. This is why you should utilize an agent like Rapid7’s Insight Agent on your instances to give you up-to-date information on what’s happening and what vulnerabilities exist, so that you can make informed decisions about remediation.
When a vulnerability is detected, all cloud providers have patch managers to deploy patches, but many cloud environments are immutable, meaning they’re designed to prohibit any changes to the infrastructure once it’s deployed. In those cases, to remediate a vulnerability you create a new image that uses an updated OS version with the patches installed. You then create a new virtual machine using the new image and then terminate the old instance.
Logs are your best friend when it comes to conducting security investigations. Without them, you’re shooting in the dark. All cloud providers offer logging capabilities. In AWS, it’s CloudTrail, in Azure it’s Monitor, and in GCP it’s Cloud Logging. It’s important to log all activity for all regions and services, even the ones you’re not currently using. As we explained in part one, someone could spin up a new service in a region you’re not actively using or monitoring, which is why you should be monitoring all regions and all services; these logs will tell you what’s happening, as well as whether there was any unauthorized access.
Once you are collecting logs from your cloud provider(s), protect them so they can’t be manipulated. Ensure that next to nobody has access to them and that they are encrypted. Most cloud providers also offer validation files that can detect if something has been changed.
You’ll then need a way to react to suspicious activities in the logs by way of alerting. One option is to use the native log monitoring service offered by your cloud provider to alert you of any suspicious actions. In AWS this alerting service is called CloudWatch, in Azure it’s Monitor, and in GCP, it’s Cloud Monitoring. However, the issue with this approach is you must build and maintain every alert, and many false alarms tend to occur.
Another option is to use a threat detection service from your cloud provider. In AWS, it’s Guard Duty, in Azure, it’s Advanced Threat Protection, and in GCP, it’s Event Threat Protection. However, pretty much every organization also has on-premises networks and remote employees to secure, and these services can’t monitor those environments. In addition, as security teams grow more sophisticated, they often find that they need capabilities not found in the cloud providers’ offerings.
This is why many companies use a third-party SIEM with threat detection capabilities, like InsightIDR. This type of tool allows you to monitor your cloud environment and all other environments in one place. This is especially key if you have an attacker that’s moving laterally, such as between a compromised laptop and your cloud environment—you need the data all in one place to run a thorough investigation, assess the impact, find out where the attacker is now, and where you need to focus remediation efforts.
Consolidate your team
Keeping things streamlined is the key to staying on top of your security objectives and ensuring policies and procedures are properly assigned and followed. Having separate teams for on-prem and cloud security is begging for silos to be born and unclear roles to arise, which can cause mayhem during an incident. Keep it simple with one unified team overseeing security, as well as one tool monitoring all environments. Of course, you can have one team with separate sub-teams for handling the cloud, on-premises, etc., but you do need to have clear accountability and responsibilities for who is on the hook for securing each element of your environment. Otherwise, tasks can be passed off like a hot potato and fingers start to be pointed.
Automation is the key to scalability and efficiency today, especially in the cloud. Things in the cloud move quickly, and humans alone can’t keep pace, but automation can. You can automate the way you configure infrastructure, onboard new employees, and even update EC2 instances. One way to do this is by using a security automation and orchestration solution like InsightConnect, which can connect all of your tools, streamline repetitive tasks that are prone to human error, and drive efficiencies in communication and response that simply aren’t possible by humans. The more you can automate, the fewer chances human error becomes a factor.
This cloud security primer is a wrap!
We hope this overview is helpful to you in navigating cloud-specific security policies so you can get up to speed quickly and ensure you have the bases covered. We’d love to hear if there are additional cloud concepts you’d like to learn about, so send us a tweet @Rapid7!