Last updated at Tue, 20 Sep 2022 16:13:28 GMT
In the spring of 2018, we launched the Open Data initiative to provide security teams and researchers with access to research data generated from Project Sonar and Project Heisenberg. Our goal for those projects is to understand how the attack surface is evolving, what exposures are most common or impactful, and how attackers are taking advantage of these opportunities. Ultimately, we want to be able to advocate for necessary remediation actions that will reduce opportunities for attackers and advance security. This is also why we publish extensive research reports highlighting key security learnings and mitigation recommendations.
Our goal for Open Data has been to enable others to participate in these efforts, increasing the positive impact across the community. Open Data was an evolution of our participation in the scans.io project, hosted by the University of Michigan. Our hope was that security professionals would apply the research data to their own environments to reduce their exposure and researchers would use the data to uncover insights to help educate the community on security best practices.
Since we first launched Open Data, we’ve been mindful that sharing large amounts of raw data may not maximize value for recipients and lead to the best security outcomes. It is efficient for us, as it can be automated, but we have constantly sought more impactful and productive ways to share the data. Where possible, we’ve developed partnerships with key nonprofit organizations and government entities that share our goals and values around advancing security and reducing exposure. We’ve looked for ways to make the information more accessible for internal security teams.
Fast forward to 2021, and wow, what a few years we’ve had. We’ve faced a global pandemic, which has really brought home our increased reliance on connected technologies, and amplified the need for privacy protections and understanding of digital threats. During the past few years, we have also seen an evolving regulatory environment for data protection. Back in 2018, GDPR was just coming into effect, and everyone was trying to figure out its implications. In 2020, we saw California join the party with the introduction of CCPA. It seems likely we will see more privacy regulations follow.
The surprising thing is not this focus on privacy, which we wholeheartedly support, but rather the inclusion and control of IP addresses as personal data or information. We believe security research supports better security outcomes, which in turn enables better privacy. It’s fundamentally challenging to maintain privacy without understanding and addressing security challenges.
Yet IP addresses make up a significant portion of the data being shared in our security research data. While we believe there is absolutely a legitimate interest in processing this kind of data to advance cybersecurity, we also recognize the need to take appropriate balancing controls to protect privacy and ensure that the processing is “necessary and proportionate” — per the language of Recital 49.
Evolving data sharing
So what does this mean? To date, Open Data included two elements:
- A free sign-up service that was subject to light vetting and terms of service, and provides access to current and historical research data
- Free access (no account required) to a one-month window of recent data from Project Sonar shared on the Rapid7 website
Beginning today, the latter will no longer be available. For the former, we still want to be able to provide data to help security teams and researchers identify and mitigate exposures. Our goals and values on this have not changed in any way since the inception of Open Data. What has evolved — apart from the regulatory landscape — is our thinking on the best ways to do this.
For Rapid7 customers, we launched Project Doppler, a free tool that provides insight into an organization’s external exposures and attack surface. Digging their own specific information out of our mountain of internet-wide scan data is the use case most Rapid7 customers want, so Doppler makes that much, much easier.
We are working on how we might practically extend Project Doppler more broadly to be available for other internal infosec teams, while still protecting privacy in line with regulatory requirements.
For governments, ISACs, and other nonprofits working on security advocacy to reduce opportunities for attackers, please contact us; we believe we share a mission to advance security and want to continue to support you in this. We will continue to provide free access to the data with appropriate balancing controls (such as geo-filtering) and legal agreements (such as for data retention) in place.
For legitimate public research projects, we have a new submission process so you can request access to the Project Sonar data sets for a limited time and subject to conditions for sharing your findings to advance the public good. Please email email@example.com for more information or to make a submission.
While it was not the primary goal or intention behind the Open Data initiative, we recognize that there are also entities using the data for commercial projects. We are not intentionally trying to hinder this, but per privacy regulations, we need to ensure we have more vetting and controls in place. If you are interested in discussing options for incorporating Project Sonar data into a commercial offering, please contact firstname.lastname@example.org.
If you have a use case for Project Sonar data that does not fit into one of the categories above, please contact us. We welcome any opportunity to better understand how our data may be useful, and we want to continue to advance security and support the security community as best we can.
More advocacy, better outcomes
While these changes are being triggered by the evolving regulatory landscape, we believe that ultimately they will lead to more productive data sharing and better security outcomes. We’re still not sold on the view that IP addresses should be viewed as personal data, but we recognize the value of a more thoughtful and tailored approach to data sharing that both supports data protection values and also promotes more security advocacy and remediation action.