The Web is not only written in ASCII. Most of us in the western hemisphere are used to reading different languages which, except from a couple of letters like ñ and ç, can be represented with ASCII (see also: man ascii) but the world has more to offer with Cyrillic, Chinese, Greek, Arabic and thousands more languages with their own special encodings.
With the latest changes we've been working on together with Javier Andalia, we're opening new doors for our users: doors that allow them to successfully scan websites written in any language and encoding. In the past, we discovered that when crawling some non-ASCII websites we were not able to reach links like http://target.tld/fooñ/bar.html, inject into parameters like http://target.tld/foo.html?bñar=3, or properly parse HTTP response bodies: issues that in some cases showed an error to the user (see Trac#166422).
After many weeks of development work and many more of testing, we've merged the unicode branch into the trunk a couple of days ago. This is the culmination of hours of work where we accomplished three main goals:
- Migrate all of our code from byte-strings to unicode strings, which allows support for any encoding
- Change our framework's logic to handle encodings in a way that's very similar to Firefox
- Improve performance by refactoring our SGML parser
Learning from our errors, we've dedicated a long time to adding more unit-tests to make sure we keep increasing the quality with each feature we add and bug we fix. These changes are also beneficial for our contributors, who will see an improved architecture and easier to follow logic in the SGML parsers.
Our next steps now are going to be focused on applying the knowledge we got from a couple of research sessions about Python's multiprocessing to w3af's core. Once again, this means that we'll be increasing the framework's performance! This time the focus will be on optimal use of the available resources, the problem we'll be solving is that Python threads only use one CPU (not very savvy if most workstations today have four of them) but Python's processes can run at the same time in different CPUs.
Still not using w3af? Download it now!