Last updated at Wed, 07 Apr 2021 18:31:15 GMT

Merry HaXmas to you! Each year we mark the 12 Days of HaXmas with 12 blog posts on hacking-related topics and roundups from the year. This year, we're highlighting some of the “gifts” we want to give back to the community. And while these gifts may not come wrapped with a bow, we hope you enjoy them.

Towards the end of November, the Tor community was shaken up by the revelation of an previously unknown vulnerability being actively exploited against pedo^H^H^H^H Tor Browser users. Some further drama unfolded regarding who the source for the exploit may be, and I received questions from several reporters who wanted every single detail I could give them. While I did not participate in commenting at the time, I'll say everything I will ever say about it now:

  • Yes, I'm aware of a very similar exploit which targeted Firefox
  • No, I didn't write it

Largely lost among all the noise are the nuances of the vulnerability and the exploit itself, which I know the author put his heart and soul into. If anonymous entrants are ever retroactively awarded Pwnies, I'd like to put his unsaid name into the hat. In this part of the 12 Days of HaXmas, I wanted to offer a high level overview to some of the more interesting parts of both the vulnerability—which in my opinion doesn't fit cleanly into any classic category—and the exploit. I'm not going to dive into all of the gory details for a couple of reasons. Firstly, timing. Had this been leaked earlier in the year, I might have been able to do the analysis part some justice. Second, while verbose technical expositions certainly have their place, a blog is not the right spot. The content might take take another 12 days to cover, and for those seeking to learn from it, I feel your own analysis of the exploit coupled with lots of dirty work in a debugger would be your best option. In that case, hopefully this can offer you some direction along the way.

The Discovery

It would be remiss of me if I didn't begin by pointing out that no fuzzer was used in the discovery of this vulnerability. The only tools employed were the Woboq Code Browser (Woboq Code Browser — Explore C code on the web), WinDBG, a sharp mind, and exhaustive effort. The era of low-hanging fruit is largely over in my opinion. Don't be the gorilla, be the lemur, climb that tree.

The Vulnerability

In the following snippet from nsSMILTimeContainer.cpp, the pointer p is initialized to the beginning of the mMilestoneEntries array.

void  
nsSMILTimeContainer::NotifyTimeChange()  
{  
  // Called when the container time is changed with respect to the document  
  // time. When this happens time dependencies in other time containers need to  
  // re-resolve their times because begin and end times are stored in container  
  // time.  
  //  
  // To get the list of timed elements with dependencies we simply re-use the  
  // milestone elements. This is because any timed element with dependents and  
  // with significant transitions yet to fire should have their next milestone  
  // registered. Other timed elements don't matter.  
  const MilestoneEntry* p = mMilestoneEntries.Elements();  
#if DEBUG  
  uint32_t queueLength = mMilestoneEntries.Length();  
#endif  
  while (p < mMilestoneEntries.Elements() + mMilestoneEntries.Length()) {  
    mozilla::dom::SVGAnimationElement* elem = p->mTimebase.get();  
    elem->TimedElement().HandleContainerTimeChange();  
    MOZ_ASSERT(queueLength == mMilestoneEntries.Length(),  
               "Call to HandleContainerTimeChange resulted in a change to the "  
               "queue of milestones");  
    ++p;  
  }  
}  

Now, consider the following two examples:

Exhibit One

<html>  
<head>  
  <title>  
  Exhibit One  
  </title>  
</head>  
<body>  
    <svg id='foo'>  
        <animate id='A' begin='1s' end='10s' />  
        <animate begin='A.end + 5s' dur='15s' />  
    </svg>  
</body>  
</html>  

Exhibit Two

<html>  
<head>  
  <title>  
  Exhibit Two  
  </title>  
</head>  
<body>  
    <svg id='foo'>  
        <animate id='A' begin='1s' end='10s' />  
    </svg>  
    <svg id='bar'>  
        <animate begin='A.end + 5s' dur='15s' />   
    </svg>  
</body>  
</html>  

In these examples, for each <svg> element that uses <animate>, an nsSMILTimeContainer object is assigned to it in order to perform time book keeping for the animations (<animateTransform> or <animateMotion> will also have the same behavior).  The epoch of each container is the time since the creation of the <svg> element it is assigned to relative to the creation of the page.  The nsSMILTimeContainer organizes each singular event in an animation with an entry for each in the mMilestoneEntries member array. See: nsSMILTimeContainer.h — DXR

In Exhibit One, the mMilestoneEntries array will contain four entries: one for both the beginning and ending of 'A', in addition to another two, one being relative to A's completion (A.end + 5s), and the other demarcating the end of the animation, in this case 30 seconds (A.end + 5s + 15s).

In Exhibit Two we see two independent <svg> elements.  In this example, two separate nsSMILTimeContainer objects will be created, each of course having it's own mMilestoneEntries array.

The exploit makes a single call to the function pauseAnimation(), which in turn triggers entry into the NotifyTimeChange() method.  nsSMILTimeContainer::NotifyTimeChange() proceeds to iterate through all entries in the mMilestoneEntries array, retrieving each individual entries nsSMILTimedElement object, after which it calls the object's HandleContainerTimeChange() method.  After some time, this method will end up making a call to the NotifyChangedInterval() method of of the nsSMILTimedElement object.  In NotifyChangedInterval(), HandleChangedInterval() will be entered if the animation being processed has a milestone relative to another animation.  In Exhibit Two, bar's beginning is relative to the element A belonging to foo, so HandleChangedInterval() will be called.

Within HandleChangedInterval(), a call to nsSMILTimeValueSpec::HandleChangedInstanceTime() will inevitably be made.  This method determines if the current animation element and the one it has a dependency on are contained within the same nsSMILTimeContainer object.  If so, as is the case with Exibit One, the pauseAnimations() function basically lives up to it's name and pauses them.  In Exhibit Two, the animations do not share the same nsSMILTimeContainer object, so additional bookkeeping is required in order to maintain synchronization.  This occurs, with subsequent calls to nsSMILTimedElement::UpdateInstanceTime() and nsSMILTimedElement::UpdateCurrentInterval() being made, and nothing too interesting is to be seen, though we will be revisiting it very shortly.

Deeper down the rabbit hole ...

What about the case of three or more animation elements with relative dependencies? Looking at the exploit, we see four animations split unequally among two containers.  We can modify Exhibit Two using details gleaned from the exploit to arrive at the following example.

Exhibit Three

<html>  
<head>  
  <title>  
  Exhibit Three  
  </title>  
</head>  
<body>  
  <script>  
     var foo = document.getElementById('foo');  
     foo.pauseAnimations();  
  </script>  
    <svg id='foo'>  
        <animate id='A' begin='1s' end='5s' />  
        <animate id='B' begin='10s' end='C.end' dur='5s' />  
    </svg>  
    <svg id='bar'>  
        <animate id='C' begin='0s' end='A.end/>   
    </svg>  
</body>  
</html>  

In this example, C's ending is relative to A's end, so we end up in nsSMILTimedElement::UpdateCurrentInterval() again, except that a different branch is followed based on the example's milestones:

if (mElementState == STATE_ACTIVE) {  
  // The interval is active so we can't just delete it, instead trim it so  
  // that begin==end.  
  if (!mCurrentInterval->End()->SameTimeAndBase(*mCurrentInterval->Begin()))  
  {  
    mCurrentInterval->SetEnd(*mCurrentInterval->Begin());  
    NotifyChangedInterval(mCurrentInterval, false, true);  
  }  
  // The transition to the postactive state will take place on the next  
  // sample (along with firing end events, clearing intervals etc.)  
  RegisterMilestone();  

NotifyChangedInterval() is called to resolve any milestones relative to other animations for C.  Within foo, B has milestones relative to C in bar.  This results in a recursive branch along the same code path which ultimately hits UpdateCurrentInterval(), which in turn sets the state of the nsSMILTimedElement.  mElementState can be one of four possible values:

  • STATE_STARTUP
  • STATE_WAITING
  • STATE_ACTIVE
  • STATE_POSTACTIVE

all of which perfectly describes their own respective meanings.  In Exhibit Three, B's beginning is set to occur after it's ending is set (C.end == A.end == 5s).  Since it will never start, the code marks it as STATUS_POSTACTIVE.  This results in the following code within the UpdateCurrentInterval() method creating a new interval and setting it as current.

if (GetNextInterval(GetPreviousInterval(), mCurrentInterval,  
                    beginTime, updatedInterval)) {  
  
  if (mElementState == STATE_POSTACTIVE) {  
  
    MOZ_ASSERT(!mCurrentInterval,  
               "In postactive state but the interval has been set");  
    mCurrentInterval = new nsSMILInterval(updatedInterval);  
    mElementState = STATE_WAITING;  
    NotifyNewInterval();  
  }  

With this occurring, UpdateCurrentInterval() now makes a call to the RegisterMilestone() method.  This was not the case in Exhibit Two.  With a new interval having been created, the method will add a new entry in the mMilestoneEntries array of containerA's nsSMILTimeContainer object, resulting in the array being freed and reallocated elsewhere, leaving the pointer p from nsSMILTimeContainer::NotifyTimeChange() referencing invalid memory.

Exploitation Overview

Just because the pointer p in NotifyTimeChange() can be forced to point to free memory doesn't mean it's all over.  Firefox overwrites freed memory with 0x5a5a5a5a, which effectively mitigates a lot of classic UaF scenarios.  Secondly, there is no way to allocate memory in the freed region after the milestone array is relocated.  Given these conditions, it's becoming clear that the vulnerability cannot be exploited like a classic use-after-free bug.  If you forced me to categorize it and come up with a new buzz word as people are so apt to in this industry, I might call it a dangling index, or an iterator run-off.  Regardless of silly names, the exploit utilizes some artful trickery to overcome the hurdles inherent in the vulnerability.  As I mentioned at the offset, for the sake of brevity, I'm going to be glossing over a lot of the details with regards to heap determinism (the terms "heap grooming" and "heap massaging" irritate me more than the word "moist").

In the first step, the exploit defragments the heap by spraying 0x80 byte blocks of ArrayBuffers, and another 0x80 of milestone arrays.  Each of the milestone arrays is filled to capacity, and then one additional element is added to each.  This causes the arrays to be reallocated elsewhere, leaving 0x80 holes.  After filling these holes with vulnerable milestone arrays, assuming the last element of the array is the one that triggers the vulnerability, there is now a high probability that the next iteration of the NotifyTimeChange() loop will point within one of the 0x80 ArrayBuffer's that were allocated first.  It is important that the last element be the one to trigger the bug, as otherwise, the memory would be freed and overwritten before an attacker could take advantage of it.

The next obstacle in the process is bypassing the object reference count which, under normal circumstances, would cause the loop to exit.  Even if this were a full technical exposition, I'd leave this part as an exercise to the reader because of reasons.  I invite you to figure it out on your own, because it's both quite clever and critical to the success of the exploit.  Those pesky reasons though.  Seasoned exploitation engineers will see it quickly, and astute students will have truly learned when they unravel the knot.

_I'd like to think that this is a good hint, but the only certainty is that it comes up on my 3 AM debugging session playlist a lot_

In any case, after the exploit does it's thing, the exit condition of the loop

while (p < mMilestoneEntries.Elements() + mMilestoneEntries.Length()) 

will never be reached, and instead the loop will continue to iterate infinitely.  While this is great news, it also means that an attacker is unable to continue executing code.  The solution to this is one of the more brilliant aspects of this exploit, that being the use of a Javascript worker thread.

var worker = new Worker('cssbanner.js');  

With the worker thread, Javascript can continue being executed while the infinite loop within the main thread keeps spinning.  In fact, it's used to keep tabs on a lot of magical heap manipulation happening in the background, and to selectively exit the loop when need be.  From here, the exploit leverages a series of heap corruptions into a r/w primitive, and bypasses ASLR by obtaining the base address of xul.dll from said corruptions by parsing the files DOS header in memory.  This, along with resolving imports, is the main purpose of the PE(b,a) function in the leaked exploit.

With ASLR defeated, all that lies ahead is defeating Data Execution Prevention, as the Tor browser doesn't feature any sort of sandbox technology.  The exploit handles this beautifully by implementing an automatic ROP chain generation function, which can locate the addresses of required gadgets amongst multiple versions of Firefox/Tor browser.  After constructing the chain, the following shellcode is appended (I've converted all addresses to base 16 for readability and added comments):

ropChain[i++] = 0xc4819090;   // add esp, 800h  
ropChain[i++] = 0x0800;  
ropChain[i++] = 0x5050c031;   // xor eax, eax ; push eax ; push eax  
ropChain[i++] = 0x5b21eb50;   // push eax ; jmp eip+0x23 ; pop ebx  
ropChain[i++] = 0xb8505053;   // push ebx ; push eax ; push eax  
ropChain[i++] = CreateThread; // mov eax, kernel32!CreateThread  
ropChain[i++] = 0xb890d0ff;   // call eax  
ropChain[i++] = arrBase + 0x2040;   // mov eax, arrBase+0x2040  
ropChain[i++] = 0x5f58208b;   // mov esp, dword ptr [eax] ; pop eax ; pop edi  
ropChain[i++] = 0xbe905d58;   // pop eax ; pop ebp  
ropChain[i++] = 0xffffff00;   // mov esi, 0xffffff00  
ropChain[i++] = 0x000cc2c9;   // ret 0x0c  
ropChain[i++] = 0xffffdae8;   // call eip+0x21  
ropChain[i++] = 0x909090ff;   // placeholder for payload address  

The shellcode basically allocates stack space and makes a call to CreateThread with the address of the final payload, which is obtained via the jmp eip x023 ; pop ebx line, as it's argument.  It next performs stack cleanup and exits the current infinite NotifyTimeChange() loop to ensure clean process continuation.  At least, it's supposed to.  Initial findings I've read from other researchers seem to indicate that it does not continue cleanly when used against Tor browser.  I need to investigate this myself at the first lull in the holiday festivities.

I hope I managed to prove that exploiting buffer overflows should be an art

-Solar Designer

That wraps this up for now. Check back for updates in the future as I continue analysis on it. If you have questions about anything, feel free to ask either here or find me on Twitter @iamwilliamwebb. Happy holidays!

References

Original leaked exploit: [tor-talk] Javascript exploit