Last updated at Wed, 30 Aug 2017 01:10:31 GMT

Sometimes sunshine can bring a smile on a cloudy day—encouraging thoughts come from entirely unexpected places.

One of our favorite Internet darlings is having a rough go. Someone posted an alleged sample of the data, which was (pretty quickly) refuted by the online marketplace.

The ever-vigilant and curious Rapid7 Labs team tore into the sample data. A diamond in the rough is what I wish to share with you.

We all know that passwords should be hashed. (There is no real reason anyone should ever need a users password)

Persnickety Note: this isn't 'encrypted' per se, it is hashed - a one way, trap door kind of operation. (Read more here)

Without public shaming, most prior password breaches featured troves of hashed, unsalted passwords. We all know that on a long enough timeline, every hashed password can/will be recovered.

That's the shiny part.

In analyzing the purportedly fraudulent database sample, we came to a conclusion that was very saliently articulated by @SamBowne:

"There is a level of encryption that renders lost data != a breach; there should be a level of hashing that does the same."

We do not know how our Favorite Online Marketplace is handling passwords (or what exactly was meant by 'proprietary') ... but this makes for a great discussion.

Break it Down

This is where we wind up our propellor hats and dive into the technical details. From the dataset, a sample from the password column looks like this:


Acknowledging $ delimiters, that looks more like:

So for humans, we break this:

pbkdf2_sha256 12000 zhMKabMgayvK iniviUCcX9y2PYJcm0AoB3MhybRA1z2Cec1DZnLWxWc=

Into this:

  • Hash specification: pbkdf2_sha256
  • Hashing rounds: 12000
  • Salt: zhMKabMgayvK
  • Result of salted password hashed 12000 times: iniviUCcX9y2PYJcm0AoB3MhybRA1z2Cec1DZnLWxWc

(This is the format that django uses.)


Finally, let's talk about the PBKDF2 (Password-Based Key Derivation Function 2) hashing algorithm as a choice. This hashing algorithm was designed to make cracking more expensive (read: difficult and slow) by individually salting and using a high number of hash iterations. This reminds us of a high-value opportunity in a tradeoff between defender verification of passwords (making users wait a tick longer) against the cost attacker discovery.

This probably can't be better articulated or explored than by our lovely friends at OWASP, go check out their password storage cheat sheet.

Simple action item?

Go talk to your DBA or Application Architect and learn more about how your teams are handling passwords.


Special thanks to Roy Hodgman for this analysis. I have mad respect for his command-line-fu.