By Asaf Perlman, Incident Response Leader
Modern security tools continue to improve in their ability to defend organizations’ networks and endpoints against cybercriminals. But the bad actors still occasionally find a way in.
Security teams need to be able to stop threats and get back to normal operations as quickly as possible. That’s why it’s essential that these teams not only have the right tools but also understand how to effectively respond to an incident.
But they can’t just stop there. They need to be trained to adapt to rapidly evolving threats. Remember that a security incident can also serve as an educational opportunity that helps the organization better prepare for future incidents – or even to avoid them.
During my recent “Step-by-step approach to tackling an incident response” webinar, I shared the SANS Institute’s six steps of incident response (IR) along with some insights and tips I’ve learned from building IR strategies.
Missed my webinar? No worries! Here’s a recap of the six steps to a successful IR.
6 steps of IR
The six steps of incident response defined by the SANS Institute are:
- Lessons learned
While these phases follow a logical flow, it’s possible that you’ll need to return to a previous phase in the process to repeat specific steps that weren’t done correctly or completely the first time.
Yes, this slows down the IR. But it’s more important to complete each phase thoroughly than to try to save time expediting steps.
Goal: Get your team ready to handle events efficiently and effectively
Everybody who has access to your systems needs to be prepared for an incident – not just the incident response team.
Human error is to blame for the majority of cybersecurity breaches.
That’s why the first and most critical step in IR is educating your people about what to look for. Attackers will continue to evolve their social engineering and spear phishing techniques to try to stay one step ahead of training and awareness campaigns. While most everybody now knows to ignore a poorly written email that promises a reward in return for a small up-front payment, some targets will fall victim to an off-hours text message pretending to be their boss asking for help with a time-sensitive task.
So, your internal training must be updated regularly to reflect the latest trends and techniques.
Your incident responders – or security operations center (SOC), if you have one – will also require regular training, ideally using simulations of actual incidents. An intensive tabletop exercise can raise adrenaline levels and give your team a sense of what it’s like to experience a real-world incident. You might find that some team members shine when the heat is on, while others require additional training and guidance.
Another part of your preparation is outlining a specific response strategy. The most common approach is to contain and eradicate the incident. The other option is to watch an incident in progress so you can learn the attacker’s behavior and what their goals are.
Beyond training and strategy, technology plays a huge role in incident response. And logs are a critical component. Simply put, the more you log, the easier and more efficient it will be for the IR team to investigate an incident.
Also, using an endpoint detection and response (EDR) or extended detection and response (XDR) tool with centralized control will let you quickly take defensive actions like isolating machines, disconnecting them from the network, and executing counteracting commands at scale.
Other technology needed for IR includes a virtual environment where logs, files, and other data can be analyzed, along with ample storage to house this information. You don’t want to waste time during an incident setting up virtual machines and allocating storage space.
Finally, you’ll need a system for documenting your findings from an incident, whether that’s using spreadsheets or a dedicated IR documentation tool. Your documentation should cover the timeline of the incident, what systems and users were impacted, and what malicious files and indicators of compromise (IOC) you discovered (both in the moment and retrospectively).
Goal: Detect whether you have been breached and collect IOCs
There are a few ways you can identify that an incident has occurred or is currently in progress.
- Internal detection: an incident can be discovered by your in-house monitoring team or by another member of your org (thanks to your security awareness efforts), via alerts from one or more of your security products, or during a proactive threat hunting exercise.
- External detection: a third-party consultant or managed service provider can detect incidents on your behalf, using security tools or threat hunting techniques.
- Exfiltrated data disclosed: the worst-case scenario is to learn that an incident has occurred only after discovering that data has been exfiltrated from your environment and posted to internet or darknet sites. The implications are even worse if such data includes sensitive customer information and the news leaks to the press before you have time to prepare a coordinated public response.
No discussion about identification would be complete without including alert fatigue.
If the detection settings for your security products are dialed too high, you will receive too many alerts about unimportant activities on your endpoints and network. That is a great way to overwhelm your team and can result in many ignored alerts.
The reverse scenario, where your settings are dialed too low, is equally problematic because you might miss critical events. A balanced security posture will provide just the right number of alerts so you can identify incidents worthy of further investigation without suffering alert fatigue. Your security vendors can help you find the right balance and, ideally, automatically filter alerts so your team can focus on what matters.
During the identification phase, you will document all indicators of compromise gathered from alerts, such as compromised hosts and users, malicious files and process, new registry keys, and more.
Once you have documented all IOCs, you will move to the containment phase.
Goal: Minimize the damage
Containment is as much a strategy as it is a distinct step in IR.
You will want to establish an approach fit for your specific org, keeping both security and business implications in mind. Although isolating devices or disconnecting them from the network may prevent an attack from spreading across the org, it might result in significant financial damage or other business impact. These decisions should be made ahead of time and clearly articulated in your IR strategy.
Containment can be broken down into both short- and long-term steps, with differing implications for each.
- Short-term: this includes steps you might take in the moment, like shutting down systems, disconnecting devices from the network, and actively observing the threat actor’s activities. There are pros and cons to each of these steps.
- Long-term: the best-case scenario is to keep the system offline so you can safely move to the eradication phase. This isn’t always possible, however, so you may need to take measures like patching, changing passwords, killing specific services, and more.
During the containment phase you will want to prioritize your critical devices like domain controllers, file servers, and backup servers to ensure they haven’t been compromised.
Additional steps in this phase include documenting which assets and threats were contained during the incident, as well as grouping devices based on whether they were compromised or not; if you aren’t sure, assume the worst. Once all devices have been categorized and meet your definition of containment, this phase is over.
Bonus step: Investigation
Goal: answer questions such as who, what, when and where
At this stage it is worth noting another important aspect of IR – investigation.
This happens throughout the process and is not a phase of its own but should be kept in mind throughout the whole process. Investigation aims to answer questions such as what systems were accessed and what was the root cause. When the incident has been contained the team will want to capture as much relevant data as possible from sources like disk and memory images, and logs.
I shared this flowchart during my webinar to help visualize the overall process:
Generic malware investigation flow
You may be familiar with the term digital forensics and incident response (DFIR) but it’s worth noting that the goals of IR forensics are different from traditional forensics. In IR the primary goal of forensics is to help move from one phase to the next as efficiently as possible in order to resume normal business operations.
On the other hand, digital forensics techniques aim to extract as much useful information as possible from any evidence captured and turn it into useful intelligence that can help build a more complete picture of the incident, or even to aid in the prosecution of a bad actor.
Data points that add context to discovered artifacts might include how the attacker entered the network or moved around, which files were accessed or created, what processes were executed, and more. Of course, this can be a time-consuming process that might conflict with IR.
Notably, DFIR has evolved since the term was first coined. Today’s orgs have hundreds or thousands of machines, each of which has hundreds of gigabytes or even multiple terabytes of storage, so the traditional approach of capturing and analyzing full disk images from all compromised machines is no longer efficient.
A more surgical approach is now required, where specific information from each compromised machine is captured and analyzed.
Goal: Make sure the threat is completely removed
With the containment phase complete, you can move to eradication, which can be handled through either disk cleaning, restoring to a clean backup, or full disk reimaging. Cleaning entails deleting malicious files and deleting or modifying registry keys. Reimaging means to re-install the operating system.
Before taking any action, the IR team will want to refer to any organizational policies that, for example, call for specific machines to be reimaged in the event of a malware attack.
Similar to earlier steps, documentation also plays a role in eradication. The IR team should carefully document the actions taken on each machine to ensure that nothing was missed. As an additional check you can perform active scans of your systems for any evidence of the threat after the eradication process is complete.
Goal: Get back to normal operations
All your efforts have been leading to the recovery phase – when you can resume business as usual.
Determining when to restore operations is the key decision at this point. Ideally, this can happen without delay, but it may be necessary to wait for your organization’s off-hours or other quiet period.
One more check to verify that there aren’t any IOCs left on the restored systems. You will also need to determine if the root cause still exists and implement the appropriate fixes.
Now that you have learned about this type of incident, you’ll be able to monitor for it in the future and establish protective controls.
6: Lessons learned
Goal: Document what happened and improve your capabilities
Now that the incident is comfortably behind you, it’s time to reflect on each major IR step and answer key questions, there are plenty of questions and aspects that should be asked and reviewed, below are a few examples:
- Identification: How long did it take to detect the incident after the initial compromise occurred?
- Containment: How long did it take to contain the incident?
- Eradication: After eradication, did you still find any signs of malware or compromise?
Probing these will help you step back and reconsider fundamental questions like: Do we have the right tools? Is our staff appropriately trained to respond to incidents?
Then the cycle returns to the preparation, where you can make necessary improvements like updating your technology and processes, and providing your people with better training.
3 pro tips to keep your org safe
I ended the webinar by sharing three things for everyone to keep in mind:
- The more you log, the easier investigation will be. Make sure you log as much as possible to save money and time.
- Stay prepared by simulating attacks against your network. This will reveal how your SOC team analyzes alerts and their ability to communicate – which is critical during a real incident.
- People are integral to your org’s security posture. Did you know 95% of cyber breaches are caused by human error? That’s why it’s important to perform periodic training for two groups: end users and your security team.
Curious to learn more? I’m glad!
You can watch the full recording of my “Step-by-step approach to tackling an incident response” webinar here.