Users are often told by information security researchers, like me, not to run a program when we don't know where it came from. However normal people and security researchers alike regularly follow search results to sites on the web we have never previously heard of.
Why would we willingly point a web browser at an unknown site and still feel safe? The browser is a computer platform and an incredibly powerful one at that. It stores some very sensitive information. It's how we control our money with online banking and brokerage houses. It’s how we communicate both professionally and personally containing our email, social networks, and line of business websites. It even has our browsing history, information so sensitive many of us would not wish it examined even by our family and closest friends. Do you have a friend that you have designated to clear your browser history in the event of your untimely demise? If you need to open a new tab and appoint that friend now I will wait. This post will be here when you get back.
Same Origin Policy
The Same Origin Policy is simple yet powerful permission system. A site can only access the resources of another site if the two come from the same origin. This means your bank cannot post to your social networks. Your work email can’t read your personal email. The blog you are reading now can't read all your stored passwords.
If this was the only thing the browser did, it probably would not have become the global platform that it has because the browser also mixes content form around the web. My blog can embed a video from YouTube, Facebook like buttons fill the webs pages, and ad networks fund most of the web’s content. This utility doesn’t come without risk. Each time a page reaches out to the ad network, the network holds an auction and the winner gets to put their ad on your computer. Code you don’t control, code the site doesn’t control, code the ad network doesn’t control, but code that the advertiser controls. We allow this code from the customer of the ad network of the site from the search result to put a running program on our computers.
For a normal application, this would be terrifyingly insecure but this is no ordinary environment—this is the web browser. The site can access its resources but not those of the ad network. Similarly, the network can see its resources but not those of the ad itself. The advertiser, ad network, and site all are present in the same page you are looking at but each is in its own secure frame where it need not fear the interference of the others. This is the environment we demand of our browsers in order to keep safe
So, what happens when our model of the same origin begins to break down? What happens when a script that an origin never intended for you to run is run and is run from that origin. This is what is known as Cross Site Scripting (XSS). A term coined by Microsoft way back in 1999. When such an event occurs, the security assumptions made by the web application developers no longer hold. If the site stored a secret in one of the HTML tags, local-store, or simply document.cookie the foreign script could send it off to a distant server. Once a script is running in the context of someone else’s origin it has all the powers of that site.
Let’s take a look at one way that a site might end up with untrusted execution. Assume you have a search engine and that search engine for clarity and ease of use has a search box on the results page. This makes it simple to edit your search to see the exact wording of the search that generated the current result set. The second feature we would like is the query as an HTTP get parameter. This makes it easy to share searches, bookmark them, and simplifies the designee of the search page. For example, https://duckduckgo.com/?q=cross+site+scripting the q= sets the query for the results page and as you can see in the screen shot the text appears in the page.
This is where the danger comes in. Let’s say you wanted to search for a common XSS payload so you put and put
into the search box we get a URL of
By using the chrome dev tools, we see that duckduckgo has escaped the search string so that it will not execute as a inline script. Now, let’s mess with the value so we can see what can go wrong…
Let’s try that with some escaping…
duckduckgo got this correct in that they protected themselves from HTML injection, but as of this writing, the escaping is not perfect. We can see this by querying as things like the unicode [RIGHT-TO-LEFT OVERRIDE] (http://www.fileformat.info/info/unicode/char/202e/index.htm) character u202E
As you can see by looking at the title when searching for strings with and without the RTLO character
I point this out not to embarrass duckduckgo quite the opposite. I think that they have done an exemplary job of building a system combines text from around the web and a query that came from the user or the URL. Despite this cacophony of user generated content and a number of different widgets that deeply combine that content with duckduckgo templates, they have built a system that protected the user from HTML injection. However, even they have still allowed a user unicode string to leak properties to their content. This stuff is very simple to get right once, but very hard to get right every time.
So how do we mitigate risk and prevent this from happening? My second post will cover just that. You can read it here, “How to Prevent XSS Attacks.”
Aaron David Goldman, Senior Security Researcher