Last updated at Wed, 26 Jul 2017 14:45:45 GMT

Little has been said about how w3af is really helping NeXpose's web application security scanner become the best in class; and even less has been said about how NeXpose is helping w3af; so I thought about writing this short blog post and tell you all about it using a short story:

The never-ending fight against memory usage
When I started to work with NeXpose, it was clear that a lot of thought had been put into making the web application scanner have the lowest memory footprint possible. If you've ever developed a web application scanner, or any similar tool, it will be clear to you how difficult this is. For those that haven't, the problem is easy to understand: "You need to scan both a website that has 30 links AND a massive website like YouYube using the same amount of memory, lets say, 50MB". Another important requirement is to have insanely fast results, so storing everything on disk won't work because of the I/O overhead.

There are a couple of data structures we can use to keep memory footprint under control without hurting disk I/O, and Bloom filter is Rapid7's favorite.

At this point in time w3af had evolved from storing lots of objects in classic python lists "[]", which uses tons of memory, into a data structure that was created to address that issue by storing those items on disk (in a really inspired day, I called that structure disk_list). This was a good intermediate solution, w3af didn't use tons of memory but was slow at seeking the information we stored on disk.

After understanding how Bloom filters were used in NeXpose, I went back to w3af and refactored the code that used disk_lists to use Bloom filters. This gave us a 20% performance improvement, reduced disk usage, and at first sight slightly increased our memory footprint.

After lots of testing, we discovered that the Bloom filter implementation we were using in w3af had some major issues, which made it consume more memory than expected. This pushed us to a different implementation of this probabilistic data structure: pybloomfiltermmap. Which uses the mmap POSIX system call to have the lowest memory footprint possible and be much quicker than the previous implementation we used (1200%!).

Our next step is to apply the lessons we have learned in w3af to NeXpose and run some performance tests to verify if we can lower the memory usage of its web application security scanner by re-implementing its bloom filters to use mmap.

Continuous feedback

I could be writing blog posts like this, telling you about success stories of things we test in w3af that then enhance NeXpose (and the other way around) all day long.

Although very different, w3af and NeXpose face many of the same challenges, and find similar ways of solving hard issues. Each tool has its benefits; winning in some areas over the other and performing less well in other areas. Working side-by-side allows us to continue to improve both products by looking into both code-bases and learning from each other's successes and errors.

An open source and a commercial solution, working together to benefit the community and the clients. That's the Rapid7 spirit!