Introduction
Critical systems, sensitive networks, automated quarantines — all of these are factors that can cause massive roadblocks for an effective and successful test. In specific environments, scanning the wrong host could result in anything from a multi-million dollar service outage, sanitization concerns, or even loss of life or limb.
There is an oft-cited anecdotal bell curve related to the skill growth any professional will have over their career. This curve shows that a professional will make the most mistakes in the middle of their growth than at the beginning or the end. In the beginning, one is cautious enough to understand the areas they don’t know and stay away from dangerous activities, and at the end of their career they have the experience to understand why activities are dangerous and avoid them. But the middle? The middle explains exactly how the experts get that knowledge to begin with. Education and unfortunate accidents.
Pentesters often understand this and take it to heart — working hard to ensure that no outages ever occur during an assessment, as this impacts the customer greatly and impacts the reputation of the tester and their organization. But this can lead to testers becoming risk averse, avoiding tests which might lead to significant findings even if the tests would be completely safe to perform.
At Rapid7, we perform internal network tests on sensitive internal environments including Industrial Control Systems (ICS) and Operational Technology (OT) environments. These systems are the backbone of our critical infrastructure and have one massive underlying weakness: some of them fail into an unrecoverable state if they receive a packet they can’t interpret. This means that scanning is off the table.
In such an environment, things can go bad quickly. One client of Rapid7 recalled during a kickoff meeting for an assessment that the previous consultancy took down their critical infrastructure for over 8 hours, leading to significant downtime of services for neighboring areas. Hearing this can strike fear into a consultant. How do you balance the need for security testing with avoidance of operational impacts?
Fear is the Mind Killer
Pentesters have an obligation to deliver an accurate depiction of environmental risks posed by configurations and deployments to clients. If a vulnerability is missed, it could result in similar or worse outages than even those caused by testing. It is imperative to keep that in mind while in a sensitive environment. This is not to say that the risks should not be ignored, but rather that testers need to push through to the other side of the bell curve so that they can work past such fears.
The first method of doing so is to identify the fear and understand what it is you are afraid of. The tester should have a plan and understand the risks of testing. Until such risks are truly understood, they cannot be avoided. This may require countless hours of research into unfamiliar technologies, but this step can, without exaggeration, save lives. Imagine a sensitive temperature sensor in a process environment for a glass manufacturer. This sensor might dictate when the device should ventilate heat or shut down in the event of failure. If the logic used by the environment allows this to fail without triggering Safety Management Systems, the pressure buildup due to unregulated temperature controls could cause an explosion.
After understanding the risks, testers can find ways in which devices can be safely tested. As an example, testing environments with cloned servers and logic could also be deployed for ICS/OT environments so testing can proceed without fear. Such an environment allows for safe testing with either resilient devices or devices which will not cause operational impacts upon failure. This allows for an alleviation of fear by providing a safety net to the consultant, if nothing substantial is at risk, more extensive and demanding testing can be performed.
Several polled testers have reported that if a client provides information that a business critical server is sensitive to testing or service scans, they will prioritize other resources first. On its face, this makes sense to perform ethical and safe testing, but this leaves out a business critical server from important testing and could leave risks present within the environment. Such a gap has equal concerns related to adequate testing. As shown above, a tester with sufficient foresight can work with the client to develop a plan for safe testing that poses no unacceptable risk to the client.
This brings us to today’s story:
SQLwhy:
Rapid7 performed an internal network assessment for a client with a small environment. This environment consisted of less than 100 active hosts. Within the scope, the client called out one specific business-critical resource — a Windows 7 host running a web service. While Windows 7 hosts are significantly outdated, seeing them within certain contexts is not wholly unexpected. Upon seeing such devices and noting them to be critical devices on which downtime is unacceptable, many pentesters will provide audit-level findings related to Operating System versions and susceptibility to known Remote-Code Execution (RCE) exploits.
This client had been tested for five years, one year by Rapid7 and the rest by other consultancy groups.
Upon accessing the environment, we found the client had protected their internal network well and was not susceptible to common attack vectors. Rapid7 discovered several small vulnerabilities, but could not gain credentialed access to any servers on any systems. The network had been sufficiently explored with the exception of the elephant in the room.
Except there were multiple elephants in the room. During the enumeration process, Rapid7 discovered a second Windows 7 host, a developer environment clone of the critical device. We followed our own advice and determined that this might be a safe way to perform such testing of the business-critical host. After a discussion with the client and review of the scope and impacts of testing, we were granted permission to perform invasive testing against the development machine. We immediately used EternalBlue (MS17-010) to successfully gain RCE on the machine as the SYSTEM user. Through a dump of credential systems on the machine, we found a Domain Administrator had logged onto the machine. Due to configuration settings on Windows OS <8.1, we obtained the cleartext password for the Domain Administrator and compromised the domain.
However there is even more to this story. Rapid7 proceeded to explore the web service on the host and found several web vulnerabilities present within files present on the application, These vulnerabilities included a Server-Side Request Forgery (SSRF) which resulted in NTLMv2 Auth coercion in the context of the machine’s administrator account, and a SQL Injection. Both of these allowed for system compromise and subsequent compromise of the domain due to DA logins.
Making the web exploitation even more trivial, however, were client-side requests found to be sending raw SQL queries to the server which would then process the input without validation. This allowed for an even easier and faster method for exploitation of the server. Tools such as SQLmap can easily be used to gain a system-level shell on a server through SQL commands. After we gained a shell, we found the process to be running in the context of a Domain Administrator, ensuring that credentials would always be present on the machine which would allow for Domain Compromise. This server could not be decommissioned due to system importance, and the level of technical debt in the system made rebuilding the codebase tricky. This system, which could reasonably be called a timebomb, had existed in this state within the network for well over a decade.
Later, in the assessment’s post-compromise phase we gained access to previous pentest reports which contained repeated use of a common theme, “due to the sensitive nature of this host, such exploits were not explored.” These reports occurred over the course of five years. While all pentests are performed in the context of “best-effort,” doing one’s best to test everything, it is important to note that the de-prioritization of the critical server to ensure avoidance of operational impacts likely played a substantial part in the consistent misses. As a collective industry, we can sometimes focus so much on the obvious dangers that we accidentally forego our own methodology. By getting so caught up in the inability to use the easy and known vulnerabilities on the outdated operating system for risk of operational impacts, the simple and quite apparent web application vulnerabilities were missed by a multitude of testers.
The client stated on a briefing call that they had been looking for testing like this for years, and were unsure how this could have been missed for so long.
Lessons Learned
We all understand the importance of avoiding operational impacts, but we can’t get so caught up in them that we silently avoid or even, in the most extreme cases, ignore specific machines during testing. As a client seeking a pentest, if there is a critical server present within your environment, work with the pentester ahead of time to see if you could deploy a clone of the machine for testing purposes, develop a recovery plan so that testing can occur within your risk appetite, or determine hours where such testing might be possible. If you are a pentester, do the work to understand your client’s environment to the best of your ability, whether that be before or during the engagement. Look for safe ways to test a host. Just because you can’t do the easy point-and-click exploits does not mean there aren’t other, just as easy, vulnerabilities that can provide the same results.
Remember that the goal is to find vulnerabilities in such hosts, and that the more critical a server is, the more important it is to test it. If we do the work to safely push to the other side of the bell curve — to the other side of the fear — we can do great things.