Last updated at Thu, 28 Sep 2017 19:30:46 GMT

by Suchin Gururangan & Bob Rudis

At Rapid7, we are committed to engaging in research to help defenders understand, detect and defeat attackers. We conduct internet-scale research to gain insight into the volatile threat landscape and share data with the community via initiatives like Project Sonar1 and Heisenberg2. As we crunch this data, we have a better idea of the global exposure to common vulnerabilities and can see emerging patterns in offensive attacks.

We also use this data to add intelligence to our products and services. We're developing machine learning models that use this daily internet telemetry to identify phishing sites and find classify devices through their certificate and site configurations.

We have recently focused our research on how these tools can work together to provide unique insight on the state of the internet. Looking at the internet as a whole can help researchers identify stable, macro level trends in the individual attacks between IP addresses. In this post, we'll give you window into these explorations.

IPv4 Topology

First, a quick primer on IPv4, the fourth version of the Internet Protocol. The topology of IPv4 is characterized by three levels of hierarchy, from smallest to largest: IP addresses, subnets, and autonomous systems (ASes). IP addresses on IPv4 are 32-bit sequences that identify hosts or network interfaces. Subnets are groups of IP addresses, and ASes are blocks of subnets managed by public institutions and private enterprises. IPv4 is divided into about 65,000 ASes, at least 30M subnets, and 232 IP addresses.

Malicious ASes

There has been a great deal of academic and industry focus on identifying malicious activity in-and-across autonomous systems3,4,5,6, and for good reasons. Well over 50% of “good” internet traffic comes from a small subset of large, well-defined ocean-like ASes pushing content from Netflix, Google, Facebook, Apple and Amazon. Despite this centralization “cloud” content, we'll show that the internet has become substantially more fragmented over time, enabling those with malicious intent to stake their claim in less friendly waters. In fact, our longitudinal data on phishing activity across IPv4 presented an interesting trend: a small subset of autonomous systems have regularly hosted a disproportionate amount of malicious activity. In particular, 200 ASes hosted 70% of phishing activity from 2007 to 2015 (data: cleanmx archives7). We wanted to understand what makes some autonomous systems more likely to host malicious activity.

IPv4 Fragmentation

We gathered historical data on the mapping between IP addresses and ASes from 2007 to 2015 to generate a longitudinal map of IPv4. This map clearly suggested IPv4 has been fragmenting. In fact, the total number of ASes has grown 60% in the past decade. During the same period, there has been a rise in the number of small ASes and a decline in the number of large ones. These results make sense given that IPV4 address space has been exhausted. This means that growth in IPv4 access requires the reallocation of existing address space into smaller and smaller independent blocks.

AS Fragmentation

Digging deeper into the Internet hierarchy, we analyzed the composition, size, and fragmentation of malicious ASes.

ARIN, one of the primary registrars of ASes, categorizes subnets based on the number of IP addresses they contain. We found that the smallest subnets available made up on average 56±3.0 percent of a malicious AS.

We inferred the the size of an AS by calculating its maximum amount of addressable space. Malicious ASes were in the 80-90th percentile in size across IPv4.

To compute fragmentation, subnets observed in ASes overtime were organized into trees based on parent-child relationships (Figure 3). We then calculated the ratio of the number of root subnets, which have no parents, to the number of subsequent child subnets across the lifetime of the AS. We found that malicious ASes were 10-20% more fragmented than other ASes in IPv4.

These results suggest that malicious ASes are large and deeply fragmented into small subnets. ARIN fee schedules8 showed that smaller subnets are significantly less expensive to purchase; and, the inexpensive nature of small subnets may allow malicious registrars to purchase many IP blocks for traffic redirection or host proxy servers to better float under the radar.

Future Work

Further work is required to characterize the exact cost structure of buying subnets, registering IP blocks, and setting up infrastructure in malicious ASes.

We'd also like to understand the network and system characteristics that cause attackers to choose to co-opt a specific autonomous system over another. For example, we used Sonar's historical forwardDNS service and our phishing detection algorithms to characterize all domains that have mapped to these ASes in the past two years. Domains hosted in malicious ASes had features that suggested deliberate use of specific infrastructure. For example, 'wordpress' sites were over-represented in some malicious ASes (like (like AS4808), and GoDaddy was by far the most popular registrar for malicious sites across the board.

We can also use our SSL Certificate classifier to understand the distribution of devices hosted in ASes across IPv4, as seen in the chart below:

Each square above shows the probability distribution (a fancier, prettier histogram) of device counts of a particular type. Most ASes host fewer than 100 devices across a majority of categories. Are there skews in the presence of specific devices to propagate phishing attacks from these malicious ASes?

Conclusion

Our research presents the following results:

  1. A small subset of ASes continue to host a disproportionate amount of malicious activity.
  2. Smaller subnets and ASes are becoming more ubiquitous in IPv4.
  3. Malicious ASes are deeply fragmented
  4. There is a concentrated use of specific infrastructure in malicious ASes
  5. Attackers both co-opt existing devices and stand up their own infrastructure within ASes (a gut-check would suggest this is obvious, but having data to back it up also makes it science).

Further work is required to characterize the exact cost structure of buying subnets, registering IP blocks, and setting up infrastructure in malicious ASes along with what network and system characteristics cause attackers to choose to co-opt one device in one autonomous system over another.

This research represents an example of how Internet-scale data science can provide valuable insight on the threat landscape. We hope similar macro level research is inspired by these explorations and will be bringing you more insights from Project Sonar & Heisenberg over the coming year.


  1. Sonar intro

  2. Heisenberg intro

  3. G. C. M. Moura, R. Sadre and A. Pras, _Internet Bad Neighborhoods: The spam case,“_ Network and Service Management (CNSM), 2011 7th International Conference on, Paris, 2011, pp. 1-8.

  4. B. Stone-Gross, C. Kruegel, K. Almeroth, A. Moser and E. Kirda, “FIRE: FInding Rogue nEtworks”; doi: 10.1109/ACSAC.2009.29

  5. C. A. Shue, A. J. Kalafut and M. Gupta, “Abnormally Malicious Autonomous Systems and Their Internet Connectivity,”; doi: 10.1109/TNET.2011.2157699

  6. A. J. Kalafut, C. A. Shue and M. Gupta, “Malicious Hubs: Detecting Abnormally Malicious Autonomous Systems,”; doi: 10.1109/INFCOM.2010.5462220

  7. Cleanmx archive

  8. ARIN Fee Schedule