I woke up this morning to find reddits abuzz with the latest password dump, this time from Gawker and related properties. The splashy headline is usually something around "1.3 million Gawker passwords leaked." I wanted to write a couple words here since the areas of credential management, password complexity, and attack mitigation are all near and dear to my heart.
Firstly, the "1.3 million passwords" figure is a little bit of a misnomer. There are a bunch of files floating around the torrent sites, one of which is, indeed, a "full" database dump of usernames, encrypted passwords, and e-mail addresses. That file is 1,247,894 lines. Trouble is, the raw data isn't normalized at all, and so there are actually right around a half million e-mail addresses, and something close to ~200k complete username password e-mail address credentials. That all said, the data most people are actually looking at today is 188,281 credentials strong, which is the pre-cracked list of credentials distributed with the drop (one exception are the guys at Duo Security, who are cracking the DES-encrypted passwords independently).
Secondly, these passwords, in the main, are not very high value, which is assuredly one reason why they were released. In very modern jurisdictions like California and the EU, the leak of e-mail addresses is much more serious. These passwords are just not that big of a deal, since they're used for by people to comment on celebrity gossip, so these kinds of throwaway credentials are pretty common for public blogs.
This reminds me of something that a pen-test friend once said -- while "password" and "123456" are pretty common tokens on the Internet -- just look at the SkullSecurity lists. However, you find them a whole lot less on intranets, since your company's administrator is probably enforcing some kind of complexity and rotation policy. For internal networks, you find dates and days of the week a lot more often as passwords, since something like "Dec-13-2010" meets most complexity requirements and is really easy to rotate on a schedule.
Of course, some of these are (were) legit passwords that will (did) work against Twitter, Facebook, and e-mail accounts with the same username, but I wouldn't get all apoplectic over them. Rest assured, of the passwords that also work (worked) for e-mail addresses have almost certainly already been compromised. Two hundred thousand credentials is not all that hard to churn through with even college-kid resources.
Finally, the password dump itself is, while headline-grabbing, less interesting to incident response and computer forensics dorks than the clues in the collateral files as to how the attackers got access in the first place. It looks like it's a pretty typical PHP attack vector, and, as Egyp7 once quipped, "PHP is a virtual machine for shellcode." Clearly, some level of source code security auditing would have gone a long way to help Gawker avoid these headlines today. In addition, there's the whole secondary story that the attackers also gained access to Gawker's content management system (CMS). This is a huge deal -- like most purely online businesses, Gawker takes their code's secrecy pretty seriously.
At any rate, public dumps of actual passwords like these are always interesting from a research perspective -- it's nice to have the opportunity to check in on the current state of throwaway accounts. While this all sucks for Gawker, the security community benefits from large-ish datasets like this, since papers get written and there are renewed pushes for proper encryption of stored passwords and passwordless authentication schemes. Hopefully, the overall security posture of the Internet ends up improved.