*Updated July 2021
When you start out with one of our vulnerability management solutions, Nexpose or InsightVM, one of the first things you should build and set up is a best practices Scan Template. Because best practices are constantly changing, make sure you look at the date this blog was posted and make your decisions accordingly. Most of the best practices I plan to discuss have been around for several years and have been implemented on many engagements to refine and test them in live environments (though there may be deviations due to different environments).
This blog post will give you a ballpark best practice that applies to the majority of environments, as well as some descriptions that outline the thought process, math, and reasoning.
To get data into the database, we will need to create sites to scan the assets. And, to create the sites, we will need a good scan template. This will be the primary purpose of today’s conversation.
First, you will need to go to the Administration page, then click on Templates > Manage under the Scan options section.
Once you have this page open, you will see a bunch of options available. We are going to use a starting template of the “Full Audit without Web Spider,” so find the copy icon next to that scan template, and click on it.
While you can’t edit a built-in scan template, you can copy it and make all the changes you want to. The “Full Audit without Web Spider” template is the best place to start.
At the top of this template, there are four different “TYPES of CHECKS:”
- Asset Discovery
- Web Spider
Asset Discovery: This option is required for all scan templates, as it is the initial NMAP process we run to both find and guess the OS, as well as find open ports. If Asset Discovery is the only option selected, the scans are 100% free and do not count against your license. This can be combined with the option below, “Use Credentials,” to do Credentialed discovery scans to get strong OS fingerprints with a free discovery scan to get a better understanding of your environment. However, this only applies to the “Asset Discovery” check type.
Vulnerabilities: This option is optional. Combined with Asset Discovery, it will be used in a best practice scan template. Vulnerabilities means we are going to take the fingerprints from Assets, ports and fingerprints found in Asset discovery, apply any applicable creds, or banner grabs and kick off vulnerability checks against those assets. This will cost IP licenses.
Web Spider: Web Spider is essentially obsolete. It is used to spider HTML-based web pages, which are less common on the modern internet. However, I still recommend using this feature when scanning externally, as it can find a backported install of Apache struts. Just make sure to go into the options and set a timeout, as it will get stuck on Dynamic content. If you want a true web spider for today’s internet, you will need to look into AppSpider or InsightAppSec.
Policies: This option is used to scan your assets to see how they stack up against different hardening guidelines, such as CIS hardening guidelines. InsightVM has a fully featured policy scanning ability and is part of the defense-in-depth process of securing your environment. Scanning credentials are required for this feature, and I recommend enabling this feature in a scan template without Vulnerabilities and setting up OS-based scan templates targeting specific operating systems. I generally recommend holding off on Policy scanning until you have your vulnerability management program established and automated, as Policy can be very complex and is hard to justify in the early stages of your deployment.
Based on the above information I generally recommend your first Best Practice scan template to only check the ‘Asset Discovery’ and ‘Vulnerabilities’ check types.
Next is the Name, which will show as “Full audit without Web Spider Copy.” I recommend changing it to “!Full audit without Web Spider.” The exclamation point (!) at the beginning will float the scan template to the top as they are ordered alphabetically. This will help assure it’s the first scan template you see when creating a site.
Next, let’s talk about the options below the scan template name:
Enhanced Logging: Basically, DEBUG logging for scans that should be run before submitting logs to support for a potential false positive issue. Only turn it on when you need it, or it will generate a ton of extra logging.
Enable Windows services during a scan: This option is essentially obsolete. Back before remote registry scans and remote registry patching, there was a policy to block remote registry access. In today’s world, that option should be off on all assets—however, in the off chance it is not off, this option will bypass it and allow a remote registry scan. I recommend enabling it.
Maximum assets scanned simultaneously per Scan Engine: This is where some math comes in, as it is variable depending on whether you're using the local engine, the OS being used, whether you're using a dedicated scan engine, and the memory and CPU levels. It is arguably the most important option in the scan template, as it controls how fast you can scan, and the default of 10 is very low based on current best practices and needs to be updated.
One of the greatest changes fixed by the Rapid7 developers is around the “simultaneous assets” that were added to the product about a year ago. It basically made it so we now run out of CPU before we run out of memory, which has had a huge impact on customers not running out of memory.
In order for this to work, we need a 1:4 ratio of CPU to Memory. This means we really need to shoot for 4 CPUs to 16 GB, or 8 CPUs to 32 GB’s of memory (2 CPUs to 8 GB also works, but is the minimum requirement, which I generally do not recommend).
Operating systems also have an impact, as Windows-based operating systems tend to use a lot more memory to run windows GUI than the Linux GUI (if headless, there is no GUI). However, I do not recommend you go with Linux just to save on memory. Go with an operating system that best fits into your patch management strategy, as you do not want your server that has all of your network’s vulnerability information on it being the most vulnerable asset in your environment because you can’t patch it. Below is a table of some recommended starting points:
|Windows Dedicated Scan Engine||150||300||700|
|Windows Console Local Scan Engine||50||150||300|
|Linux Dedicated Scan Engine||200||400||800|
|Linux Console Local Scan Engine||75||200||400|
I have found from working with many customers in many environments that these numbers should work as long as the CPU/Memory is reserved or dedicated to the box. I’ve also found that with these numbers that the average Memory/CPU should be around 60%, so they could be increased further, but do so at your own discretion. I’ve pushed a 32GB/8core dedicated Linux engine to 999 (the field only goes to 999) before and it worked just fine, but I don’t necessarily recommend it.
What do these numbers mean? With an average credentialed scan taking around around 12 minutes, give or take, that means there will be five volleys of assets per hour, and on as 16GB/4core Linux dedicated engine running 400 simultaneous assets per engine, it would mean we would scan approximately 2,000 assets in an hour (with just the one engine). If we add more than one engine, just multiply that number by the number of engines in an engine pool. Scanning can be really fast.
Another factor to take into account is a new option in the “Vulnerability Checks” tab of the scan template called, “Skip checks performed by the Insight Agent.” This option will detect whether an agent is installed and only run the small percentage of checks that the agent cannot do. These scans will run much faster and probably average closer to five minutes per asset, so using the above equation, you could scan even more assets in an hour. For this reason, Agents are highly recommended on as many assets as possible in your environment.
The final option in the scan template is the “Maximum scan processes simultaneously used for each asset” with a default value of 10. I recommend leaving this at 10. This is all about scanning multiple services, and boxes should not have more than 10 open services. If they do, you should seriously look at why they have so many open services, as each service is a potential access vector for a vulnerability. In the end, it used to be 1, and 10 is 10 times better than it used to be. I generally do not recommend going bigger.
A lot of this information can also be found in this guide. There will be some differences, as a lot of the numbers in this guide are based on best practices from five years ago, and are not the best practices of today. This is another reason why I recommended checking the date this was written. Things are constantly getting better and improving!
Next we will talk about the ‘Asset Discovery’ tab on the left of the scan template options. I like to call this the dead asset mitigation tab, as its primary purpose is to root out all of the dead assets so we can focus on alive assets.
The first section, “Verify Live Assets,” includes the steps we take to determine whether an asset is alive. First, we try an ICMP Ping. If we get a response on a given asset, we’re done and don’t need to take any additional steps—fast and easy! However, if we don’t get an ICMP reply, we then try ARP. If the scan engine is in the local network segment, this might work and we might get a reply, and if we don’t, then we move on to TCP, and finally UDP.
If we get to TCP, we will send a SYN to the listed ports. We are hoping for an ACK reply, but a response isn’t a response, isn’t a response. Some TCP responses can be a bad thing, such as a TCP Reset response. TCP reset responses are generally sent by an IDS in an attempt to shroud the asset by responding with TCP resets for assets that don’t exist. This can cause a massive number of “ghost assets” or assets that show up as alive with no hostname or operating system, meaning thousands if not tens of thousands of assets show up that do not exist. To help with TCP resets, make sure to check the box for “Do not treat TCP reset responses as live assets.” I find this option to be almost required in most environments, as firewalls that you may scan across have some type of built-in IDS functionality that may cause this.
Last comes UDP—though it’s unlikely a box will be alive that doesn’t respond to ICMP, ARP, or TCP but is somehow found using UDP. When scanning large network ranges like class As or Bs, unchecking this option is just one less thing to try on every dead asset and would allow the scan to get through all of those dead assets much faster. It’s up to you whether you want to disable this, though I generally recommend it.
The next section, “Collect More Asset Info,” I would recommend leaving all default. “Find Other Assets on the Network” and “Collect WHOIS Information” do work, but it does not integrate that information into the database. So, get on your log spelunking hat, because that’s the only way to see those results. For this reason, I do not recommend turning those on.
Never turn off “Fingerprint TCP/IP Stacks,” as that is going to give you that OS guess that is better than nothing when not using scanning credentials. Also, the retries setting defines how many times InsightVM will repeat the attempt to fingerprint the IP stack, and unless you're in a very latent environment, I recommend leaving it at zero.
Finally, address the minimum certainty. This option will dump results under whatever is set as the minimum. Years ago, we used to recommend making this number a little bigger (like .65) but now I generally recommend default, as I would rather take something than nothing. Here are some basic fingerprints you might see:
- Anything below .70 or 70% is an IP stack analysis guess. We look at the ACK and NMAP makes a guess. In general, .60 to .70 tend to be more accurate guesses, and anything below are low-certainty guesses, but it’s better than nothing, in my opinion.
- .80 or 80% generally means we found a banner, usually an HTTP banner.
- .85 or 85% usually means credentials were successful, but they only achieved GUEST access and could not access the registry and could not get a software inventory.
- .90 or 90% generally means an SNMP banner.
- 1.0 or 100% means credentials worked, and you should be getting the exact OS of the asset, which generally equates to the best possible scan results.
The last section, “Report Unauthorized MAC Addresses” is explained in the template, and I don’t recommend using it unless you really need to.
I recommend leaving everything default in this section, especially when doing unauthenticated scans where the goal is to get an overall view of the network. The service discovery could also be called port discovery, and by default, we use SYN>ACK, which is very solid, stealthy and lightweight. The “Well Known Port Numbers” also tend to be good enough. For authenticated scans, where you just want to run patch checks, you can get much better performance by limiting to the ports known to be used with any credentials provided in the site configuration.
The only time I might recommend doing all 65K is when scanning your external attack surface assets. Never run all 65K UDP unless you want to wait a couple months for the scan results. If you are curious to see what the well-known ports list looks like, you can find the full NMAP command at the beginning of any scan log roughly a couple hundred lines down.
Another workflow that might not be a bad idea is to do a quarterly or yearly all 65K TCP scan just to get the delta on any ports that we are not finding with the “Well Known” ports, then add them to the additional port list. As for excluded ports, I’ve seen a lot of customers exclude the 9100-port range, as it can impact certain printers.
The Service Name file is another option with a description and use at your discretion, and I generally recommend just leaving it alone.
The last option, “Nmap Services Detection,” is a new option that adds roughly 88,700 additional NMAP fingerprints, at the cost of the scan taking five to 10 times longer. I only recommend enabling this option for sites that you do not have good credentials for. We can also account for some of the scan time increase by increasing the parallelism setting in the next tab, “Discovery Performance,” at the cost of increased bandwidth used by the scan. I have found that 500 almost completely mitigates the scan time impact.
Note: I’ve never been able to get solid testing numbers on the bandwidth impact of increasing parallelism to 500, so I recommend doing some testing with the network team if you decide to increase that number.
This brings us to the Discovery Performance tab:
These are all basic core NMAP options recast into tuning options on this page. For detailed explanations, you can visit this site to find in-depth reviews of each option—you just have to figure out which option is which.
Of the above options, the only tweak I ever make is to the initial timeout for highly latent assets with ping time greater than 500, which is extremely rare. I usually just look down at the below three options of:
- Scan Delay
- Packets-Per-Second Rate
By default, Scan Delay is 0 and Parallelism is 0. They are both dynamically set by the minimum packets per section rate (min-rate). Basically, the Scan Delay is how long we wait between sending volleys of SYNs to ports, and the parallelism is the size of the volley.
The minimum packets per second section is the primary way to make the Discovery part of the scan run fast, at the cost of bandwidth. With TCP being designed in the 1970s back when networks were extremely small, it was designed to always slow down rates to the minimum, so increasing the minimum will force the network to use more bandwidth and run the scan faster. The default of 450 packets per second, with an average packet size of 1,500, means you are using around 600kB/s of bandwidth for your scan. This is not an issue unless you are running on an old ISDN or DSL WAN connection. Most WAN links today are usually 100MB/s plus, so this number can be gradually increased to speed up the scan. I generally recommend trying 2,000, and work with your network team to ensure they don’t notice any impact and maybe go a little bigger.
On a 100 MB/s connection a couple MB of bandwidth is usually inconsequential and can greatly speed up asset discovery. For example, a Class A, over 16 million IP’s at 450 usually take a little over 3 days to scan, and with the PPS set to 2000 it will take less than a day.
Note: These PPS settings are per engine, so if you set it to 2000, which means ~2.5 MB/s, if you have a pool of 4 engines, that might use closer to 10 MBs of bandwidth.
The last section of the scan template that I generally discuss with customers is the ‘Vulnerability Checks’ tab:
There are four options that we can turn on or off at the top of this section and I’ll outline those below:
Perform unsafe checks: This is a remnant of the past where the PCI board required basic DoS and Buffer overflow testing for PCI ASV testing. The PCI board no longer requires this, and these checks are no longer used, but are still in the console. Note that most of these checks are extremely old and could still cause damage, so I recommend not checking this box.
Include potential vulnerability checks: It is not always possible for unauthenticated vulnerability checks to be perfectly reliable. When a higher false positive rate is acceptable, in order to minimize missed vulnerabilities, “potential” checks can be enabled. This is not recommended for default scans due to the lower signal-to-noise ratio.
Correlate reliable checks with regular checks: This option I have still seen used and can be of great help. It basically only applies to RHEL or CentOS operating systems running either PHP or Apache. Let’s use RHEL running Apache, for example. System Administrators like to run Apache on RHEL because when you update RHEL, it automatically updates Apache, which is nice. The challenge is, RHEL does not update the Apache banner when it updates Apache, so if you’re not using credentials and we can’t get a ‘Reliable’ check on your OS, we have no choice but to use the ‘Regular’ banner grab and depending on how many years old it is, it may mean thousands of false positives. This could still happen even with Credentials if this option is unchecked, so always make sure this option is checked so it will correlate the reliable credentialed OS grab with the regular banner grab to remove any vulns detected from the banner information.
Skip checks performed by the Insight Agent: We discussed this above and it's pretty self-explanatory. If we detect that the asset has an agent, we only run the few checks that the agent cannot do which will both greatly speed up the scan and also increase the accuracy of the scan by not trying to correlate scan results with agent results.
Last, in the ‘Selected Checks’ you can see all of the vulnerability categories which can be used for other parts of the tool, or search for checks using the ‘By individual check’ drop-down. None of these options should ever be touched.
Of the remaining sections:
- File Searching is a thing, and I don’t recommend using it.
- Spam relaying is essentially obsolete with Web Spider.
- Database servers can be used if you plan to do policy scanning config checks against non-windows databases so you can add the database names.
- Mail Servers, CVS servers and DHCP servers just leave default.
- Telnet Servers is used if you get Telnet false positives so you can add custom Telnet failed login responses to the regex if you run into those default account challenges.
Now you should have a great starting Scan Template for your InsightVM scanning deployment. Just hit save and now it will be the first result when creating a site to start bringing data into your database!