Mar 27, 2013

Unsecured Public Information in Amazon S3
Buckets - Are Your Buckets Leaking Data?

In today's Whiteboard Wednesday, Will Vandevanter talks about the Amazon S3 service and how a lot of people are misconfiguring their Amazon S3 buckets, leaving personal information exposed. Amazon S3 provides the ability to store and serve static content from Amazon's cloud. Files within S3 are then put into "buckets" which are accessible through a predictable URL.

Will and his team lead a research project to see how many people are leaving their data in public buckets which are accessible to anybody. What did Will find? Watch this quick video to find out.

Check out Will's blog post to learn more on this topic!

Read Video Transcript

Hi. My name is Will. And today's Whiteboard Wednesday is on some research that H.D. Moore and Marcus Cary and I did on Amazon S3 public buckets.

So, to kind of start with the dry stuff, if you haven't heard of it, S3 is a simple storage service. And essentially it allows you to store data in the Amazon cloud. So we have our clouds here and you can upload data.

And if you've never used it before, the basic process is, you go onto Amazon's site, you sign up for S3, and you name your bucket. And you can name your bucket whatever you want so long as it's unique. So, in this case we have 'Bucket' as the name of our bucket. And then from there you begin to upload data.

There's two types of permissions to talk about. One is file permissions. These are actual permissions of the file inside the bucket. And so those can be readable, writable and you can specify who can access them.

The other is the actual bucket permissions themselves. And it's typically registered as public or private. So a private bucket is only allowed, certain people can upload to the bucket. But a public bucket, anybody can read the bucket itself.

And it's important, in this case we'll just talk about read-only buckets. So if you visit the actual location of the bucket, it will output all of the files for the bucket itself. And so that's a public bucket and we have the file listing here.

The other thing to think about is, it's a predictable URL. So when you first create the bucket, it becomes And if you actually put that in your browser and you go to it, if it's a public bucket you get this X amount of output. It will dump out all the files. If it's a private bucket you'll get access denied. And if it doesn't exist it will say "does not exist".

So, essentially the core of the research, and that's what the background was, one, how many buckets can we identify, what percentage of those are public, and what sorts of files are stored out there for public buckets?

So we did a it using three techniques. One was guessing. Seems really obvious, right? It's predictable. So we took basically a list of companies, like Fortune 1000 companies, Alexa 100,000 websites. So, really, things that are out there on the internet. For the companies we would do a little bit of changes to it. So if it's "bucket", it would become "", bucket media…just trying to brute force what people would actually name their bucket.

And the second one was critical I/O, which was HD's massive internet scanning project. So it was actually taking buckets that are already out there, ones that people have on websites, and visiting those and seeing if they're public or not.

And then finally the Bing API. So, the Bing API essentially allows you to search the Bing resource in a scriptable way. So we would get XML or CSV output for buckets in the Bing API.

So what do we find? We found about 12,000 buckets. These are both public and private. And these were for major companies essentially. And then 16% of those were actually public. So again, the public means we can view all of the files that are inside the bucket.

And the risk is really similar to a directory browsing risk. Once you can actually see what's inside the bucket then you might be able to gain access to files that you wouldn't even have known existed if it were private.

And so using that, we basically spot-check. If we saw a file named 'Passwords.txt' or something along those lines, we would see if we could pull it down. And what we actually found was kind of alarming. We found user names, passwords, bank account information, and lots of source code.

So it seemed to be, it is a fairly major risk. And it's important to check your bucket and identify if it's public or private for the permissions. The best way to do that is that you would enter the URL and the website. So you would want to talk around the company and see if you are actually using S3. If you are, then it's going to be Put that in your browser. If you get a list of files, then you need to modify the permissions.

And if you check out the blog post, there's actually Amazon's recommendation. There's links in there about how to secure your bucket and sort of the best practices. So I'm Will. That's it, and see you next Wednesday.

Protect Your S3 Data

Learn how to configure your S3 buckets to protect your data

Learn more
Whiteboard Wednesdays Logo

Subscribe to our weekly
Whiteboard Wednesday videos