As the amount of data within our organizations continues to grow at a nearly exponential rate, we are forced to look for more efficient ways to process it all for security use cases. If we want to keep up as these problems and systems scale, we need to learn from industries much older than computing. For example, gold miners have graduated from just a shovel and pan to automating as much of the process as possible, reducing the manual labor to the areas machines just cannot handle. From the beginning of the data analysis process, there are four pieces of your security data processing which can be automated and a fifth you need to have human eyes and minds to handle.
Automate the collection from all potentially valuable data deposits
If you were a fan of either Pale Rider in the 80s or White Fang in the 90s, you can probably recall the way gold was mined for centuries. Once someone decided they had a potential gold deposit, pick axes and shovels were popular tools to free the gold particles from the surrounding rock. Using blunt instruments in very specific places is quite similar to the original process to debug software issues and look for malicious activity by logging in to a specific system and searching through the logs. In recent years, over a dozen stages of crushing rock, pumping water, applying pressure, and various other methods of filtration have been introduced to reduce the need for luck when “digging for gold”. Thankfully, for every security team's sanity, data prospecting from system to system was similarly seen as too laborious to scale around ten years ago and any relevant data is now centralized. You should demand a software solution which offers multiple options for data collection (read: not just a syslog feed) and allows you to monitor your data sources for inactivity or sudden failures in collection.
Automate the normalization of data
I won't force the analogy into every section because reducing large chunks of rock down to just the valuable minerals doesn't involve the understanding of the rock and addition of other rock in the way more data must be used to help interpret data. In order to simplify the later processing of the raw data, you must address an area less frequently automated by off-the-shelf software solutions: normalization. Many solutions put the onus on the customer to configure connectors, match specific fields in their logs to the “normalized” data field, and maintain this parsing and translation as logs change over time and new devices are deployed. There is no reason this shouldn't be automated for the vast majority of data sources common to organizations today. Your team shouldn't have to deal with these manual tasks unless you are working to monitor your own internal systems through custom logs.
Automate the internal attribution of activity
After the data is normalized, a massive amount of time is generally spent figuring out the actions that were actually taken, and it shouldn't be so time-consuming. If your automation stops there, it takes a great deal of working knowledge just to take the normalized data correlated with other information by a single shared attribute: concurrency. Your team shouldn't have to first identify a specific log line or other data point as being of interest, and then conduct a few more queries or “pivots” to other data before you can determine which user was responsible for the action and on which asset. This attribution should be automatically performed as the data is ingested. Why would you ever want to look at the data without knowing these details?
Automate the behavioral analysis
The last stage of automation you should demand of your monitoring software is the explanation of the behavior. What did the user actually do on the asset? You shouldn't have to diligently decipher these details every time you investigate the same kind of activity in your machine data. Without the context of the amount of data transmitted externally or the destination organization, a lot of time can be spent simply to find out your code repository vendor now has a few new public IP addresses. And as long as the previous stage is automated, you can immediately see that a software developer transmitted this sudden increase in data from her primary machine. These details should all be obtainable at a glance. Critical thinking should be reserved for determination of intent and recognition of new, questionable behavior.
Click here to learn more about User Behavior Analytics.
Ease the manual analysis with usability, visualizations, and flexible search
Once you have automated these first four stages of data analysis, the security team can spend a lot more of its time deciding whether the activity is malicious and what should be done about it. It's like the process of panning the loose surface sediment in hopes of leaving nothing but the gold and other high specific gravity materials for manual review. It doesn't scale to only perform these actions all day because you don't know where to look in the data, but it is very effective as a complement to the automated analysis. With the four stages of automation, you can already have enough direction and context to know where you're looking before taking this last action and planning remediation. By pairing the automation with data visualizations and rapid search capabilities, you can make this final stage as painless and quick as possible for your team to act with confidence.
If your team needs to use the extensive log, and other machine, data in your organization to effectively detect attackers as they laterally move from initial compromise to multiple endpoints and, eventually, the systems containing the most valuable data, you should not be forced to build every stage of the processing yourselves like in the old days.
If you want to learn Rapid7's approach to automating the first 80%, check out an InsightIDR demo. If the last 20% of manual effort is your challenge, you can start a free trial of Logentries here and search your data within seconds.