Brute Force Attacks Using US Census Bureau Data

Currently one of the most successful methods for compromising an organization is via password-guessing attacks. To gain access to an organization using brute force attack methods, there are a minimum of three things a malicious actor needs: A username, a password, and a target. Often the targets are easy to discover, and typically turn out to be email systems such as Outlook Web Access (OWA) or VPN solutions that are exposed to the Internet.

Once a malicious actor has a target, they next need a list of usernames fitting the context of the attack. In this case, we can often turn to the Internet and search Google or similar search engines for email addresses related to the organization. A commonly utilized alternative is to make use of business-centric social media (such as LinkedIn) to discover names, which can be converted into usernames.

Besides gathering usernames from the Internet, a malicious actor also needs to identify the username syntax in use. The most common syntax encountered for usernames is first initial lastname—for example, the user John Doe's username would be jdoe. Using resources such as Google and Linkedin for gathering usernames to target is a very simple process, but inevitably the amount of names gathered this way can often come up short and represents a significant effort on the part of the malicious actor.

Of course, there are several other attack methods, which can be used to enumerate valid accounts, for example RCPT TO requests against an exposed SMTP service or the OWA timing attack by Nate Power can also be used to great effect. Although if those attack vectors have been mitigated, what next? Well, personally I like turn to the United States Census Bureau for help.

It turns out the US Census Bureau published a very valuable piece of information for genealogy researchers—and of course this data lends itself well to be leveraged to conduct brute force password attacks. So how is this data useful you ask? Well, let us explore it and see how to effectively leverage this data.

The data I am referring to is available at the US Census Bureau and consists of surnames, which have an occurrence greater then 100 from the 2000 census. This data is in descending order from most common to least common as shown in Figure 1:

Due to the ordering of the data it is possible to increase the odds of guessing a users lastname with reasonably high probability of success by selecting the top most common surnames from the list. As previously stated, first Initial last name is the most common username syntax encountered, so to build a list of valid usernames using this data we first need to figure out what initials to use.

Of course we could use the full alphabet of 26 characters and combine that with the full list of names from the 2000 census, but if we do that we end up with 3,943,472 possibilities. Firing that against an Internet facing service to brute force accounts is not practical unless you have a lot of time on your hands. The solution is to figure out what first initials and surnames to use that will have the highest probability of resulting in an active account within the targeted service.

To determine the most effective list of first initials we can turn to the 1990 census data, which contains a list of the most common first names from the 1990 census. In my case I took the first initial from top 100 female name and top 100 male names, which gave me the following 20 initials: ABCDEFGHIJKLMNPRSTVW.

Next, I took the top 1000 surnames and combined these with the initials. You may be asking, why not top 2000 or 3000 surnames? Well, using this data I conducted a number of real world tests using various combinations of this data and determined that using the top 20 initials combined with top 1000 surnames successfully matched between 25-35% of an organizations employee usernames. I also determined that using more initials and surnames had diminishing returns as we increase the count and greatly increased the time to conduct the brute force attacks.

So now we have a list of 20,000 potential usernames we need to figure out what passwords to use. Password guessing is another interesting subject on its own, and I would suggest taking a look at The Attacker's Dictionary, which is an outstanding piece of work on the subject. If I was only going to pick a single password to try, I would have to admit my favorite password for brute force attacks is constructed of season combined with year, and has the first initial capitalized such as “Winter16”. It seems this form of password has a very high success rate. I remember one summer while conducting external penetration tests I compromised 10 organizations in a row using this method.

The most common reason for this is that it meets Windows password complexity requirements (Figure 2).

Furthermore, password policies are often configured requiring users to change their password every 90 days. Guess what else changes every 90 days: Seasons. And this style of password is very easy for an employee to remember (Winter16, Spring16, Summer16, Fall2016).

So if we combine all of these together—a username of first initial lastname, Windows password complexity enabled in addition to a 90-day password rotation—it can lead to the perfect storm, which can allow a malicious actor to potentially gain access to your critical exposed system over the Internet.

Finally, what steps can be taken to mitigate these issues? I like to recommend the following items, which I have found to seriously impact my ability to compromise an organization through brute force attacks when correctly implemented.

Email address and username account should not match. Example: if email address is [email protected] username should not be jdoe
Employee awareness training about the risk of using seasons (Summer, Winter, Fall, Spring) in the password. This should also include company names and any dictionary word related to your industry. Also regular password audits should be conducted to ensure employees are following these rules.
Two factor authentication of employee-based services exposed to the internet (OWA, VPN, Extranet)