The very essence of an XDR platform is facilitating detection and response to persistent threats by collecting and analyzing data from different sources, with endpoints being the most dominant points of origin.
Collecting all the correct data, however, is only one part of the equation. Just as important is: how long will your data be available? Put another way, how important is it to be able to go back in time? And, indeed, how far back can you go?
While having a rich data supply is obviously a necessary condition for any effective XDR platform, the platform is only as good as the longevity of its data. Threat actors are patient adversaries, and your XDR platform needs to be able to out-wait your attackers. So, when it comes to data retention, just how long is long enough?
The Need for Data Retention
Here’s a typical statement from a Security Researcher about an incident he handled:
“I was working for a large multinational corporation at the time when we found we had Winnti in our network for over a year. We only found it because of a report that was released by a security vendor at the time, with IOCs.
We only kept logs for three months, and we had no idea when the attack began. Finally, we found VPN login/location logs that were retained long enough that showed us a user was in the Middle East and logged out at the end of his workweek and that same night logged into the network from Africa.
After this incident, we purchased a SIEM and began planning for data retention. As often happens with SIEM projects, I left before that project was complete, and I’m not sure that even today they have more than a year worth of retention, as that company has 100Ks of end-point, which means A LOT of data.”
This is just one case, but it does hint that security teams often discover how much data retention they need only when they come face-to-face with threats that linger in their environments for long periods. For many of them, it’s a case of hindsight being 20/20.
Also, even large and resourceful corporations often choose not to invest in making sure they have the data they need for as long as they will need it.
The anonymous story above may remind readers of a recent chain of events around one of the most concerning campaigns of recent years: SUNBURST. After the attack was found, the related DNS calls published by CloudFlare showed that infections began as early as April 2020 and took eight months to discover.
If you have data that is only kept for 30 days and were infected at the peak of the SUNBURST storm back in mid-April, how easy would it be to know if you were hit and contained the attack?
It might be tempting to think of these two cases as outliers. Surely, not all attacks are SUNBURST! But when we look at the aggregation done as part of the IBM Security Cost of a Data Breach Report 2020 report, statistics show 280 days average time to identify and contain a data breach. Using IBM’s words:
Data Retention in the Cybersecurity Industry
Thus far, we’ve demonstrated that Data Retention is essential. But where does that rubber meet the road? Next we will take a look at what vendors in the industry offer. Are they doing the right thing and offering the data retention you need to reduce our risk?
Well, some will, and some… not really.
For example, some EDR vendors start you off with less than ten days of data by default. You can hunt threats, but only if they reside for a week in your systems. SUNBURST? catch it within a week from infection or wait until you are compromised.
Others don’t store all the data.
Upgrade if you want. Not exactly. The furthest you can go back with almost all vendors is 90 days – which as we saw is just not enough. To add insult to injury, it’s also quite commonly cost-prohibitive.
How Does Sentinelone Deal With This Topic?
Data is at the very heart of everything we do as a company. Training our AI models, Dynamic analysis of Storylines, and Singularity XDR, the industry’s leading solution to the problem raised here earlier – all use big-data to solve cybersecurity problems.
That’s why our very first acquisition was Scalyr, a leading big data analysis platform. With Scalyr at the core of our XDR platform, we will be able to absorb terabytes of data, storing them, and most importantly, provide customers with the tool to effectively search and analyze the data to enable the hunting of APTs.
Our technology and platform enable SentinelOne to offer up to a full year of XDR data retention. Not just the malicious data, not some of the data – but ALL of it. Moreover, accessing the oldest data point is done in exactly the same way as accessing something that happened yesterday.
There are also multiple other parts of our platform that align with our data-centric approach.
One of these is Binary Vault: making executable files, malicious or benign, available in singularity for you to download for further or future analysis.
At the end of the day, in a world that is becoming dominated by AI, cybersecurity becomes more and more reliant on big data. As security and risk management professionals, it is our duty to make sure we got all the data we need, even if it is not always convenient for the vendor to retain it for us, in an observable format that will help us react faster to the next attack.