The Complete Guide to MITRE’s 2020 ATT&CK Evaluation

What is MITRE ATT&CK and Why Does It Matter?

The work MITRE is doing to bring a common language to cybersecurity is of monumental value to defenders everywhere. MITRE’s innovative approach to tool effectiveness evaluation has been broadly welcomed in the industry – both among vendors and enterprise customers. At SentinelOne, we have fully embraced our experiences participating in each MITRE ATT&CK evaluation, deeply integrating MITRE’s framework into the design and ongoing innovation of our solution. But what does the evaluation mean to your business, and how can you use it to better understand and use the security tools at your disposal?

In this post, we explain everything you need to know about the latest MITRE evaluation and the current round of tests to help you make the most of the upcoming results.

What is MITRE ATT&CK Framework?

MITRE describes its framework as “a curated knowledge base and model for cyber adversary behavior, reflecting the various phases of an adversary’s attack lifecycle and the platforms they are known to target.” 

The key words here are phases and behavior. When an adversary has a strategic objective – think data exfiltration or establishing long term command and control – they will use multiple tactics in phases. Each phase consists of behaviors which are simply a set of techniques. Techniques, in turn, have varying sets of procedures. Therefore, the end goal comprises an initial tactic with one or more techniques, followed by another tactic with its techniques, and so on until the adversary’s objective is met. This layering of general tactics down to specific procedures is where we get TTP: Tactic, Technique, Procedure. 

In MITRE’s ATT&CK framework matrix, tactics are represented in the column headers, techniques in the items listed in each column, and procedures – the detailed implementation of a technique – are described in each entry’s listing.

The Key Measures of MITRE ATT&CK 2019
SentinelOne’s performance in MITRE ATT&CK 2019 is EDR at its finest

A Common Language For Threat Actor Behavior

The purpose of MITRE’s ATT&CK framework is to create a modular, common language to describe how threat actors operate so that we, as defenders, can use our detective security controls more effectively. 

To illustrate the point, think of a baker who creates an array of desserts and breads. What kind of baked goods does he tend to produce? How does he go about making them? We might describe a given product in terms of the recipe needed to produce it, which essentially must detail the many techniques, step by step, needed to achieve the desired result. 

For each item the baker produces, there will be a different recipe, but these recipes often have many steps in common. And insofar as the techniques vary here and there, the end results will differ to varying degrees. British scones are very similar to American biscuits, but both are quite different from a French baguette. However, all three have similar preparation steps and ingredients applied in subtly different ways to create differentiated end products. 

By predefining how recipes tend to flow at a high level (tactics) and the baking techniques and procedures used across that flow, we can define a baking model that can be used to define the factors and actions necessary for the creation of all sorts of treats, and maybe even attribute which baker made which particular treat.

MITRE points out that it is a “mid-level adversary model”, meaning that it is not too generalized and not too specific. High-level models like the Lockheed Martin Cyber Kill Chain® illustrate adversary goals but aren’t specific about how the goals are achieved. Conversely, exploit and malware databases very specifically define one or two jigsaw pieces in a large puzzle but aren’t necessarily connected to how the bad guys use them or to identifying who the bad guys are. MITRE’s TTP model is that happy medium where tactics are intermediate goals and the “why” of a technique and procedures represent how each tactic is achieved. 

Record Performance
Join our MITRE webinar to see SentinelOne’s victorious performance against APT29.

So How Can MITRE’s Framework Help Defenders?

MITRE’s model represents the attacker’s perspective. It is a representation of how I, as the attacker, go through my process to exfiltrate data from you, the victim. Crucially – and herein lies the real power of integrating the MITRE ATT&CK framework into a security solution like SentinelOne – tactics do not exist in isolation. 

Each tactic is in a context with the previous and succeeding tactics. Context across tactics creates a story we can tell about a campaign.

In an attack, each tactic is related to those preceding it and those that follow it. Understanding TTPs in context allows us to create a story that we can tell about a campaign, and consequently offers defenders a far more powerful means of detecting attacks. Integrating the MITRE ATT&CK framework into our detection capabilities means that we can recognize events in our environment which alone may be insignificant – think about the problem of distinguishing Living off the Land techniques from false positives. However, when several “LOL” binaries are executed in a particular sequence, they can be seen as related to each other and understood as a tactic to achieve an adversarial aim. 

How Does MITRE ATT&CK Evaluate Security Products?

Now that we have a clear understanding of the framework and its relevance, let’s look at how the MITRE ATT&CK evaluation tests security vendors’ products. 

The evaluation sets out to emulate an attack from a known-real world APT group. In Round 1, MITRE chose to emulate attacks used by APT3. In this year’s Round 2, they chose APT29. 

Attack emulation sets out to chain together a set of techniques that have been publicly attributed to the adversary in question. For example, if the adversary has been seen using certain privilege escalation and persistence techniques in their campaigns, the emulation may chain those together in the test, even though they may not have been used together in actual real-world attacks. The aim is to put together a complete, logical attack that moves through all the stages of a comprehensive, successful attack from initial compromise to persistence, lateral movement, data exfiltration, and so on. In other words, the emulation doesn’t necessarily follow the actual logic used by the adversary in the wild; it is a constructed logic based on the adversary’s known TTPs. 

The MITRE ATT&CK emulation does not aim to test each and every TTP in the framework; only known TTPs of the chosen adversary are tested.

The environment for the attack emulation involves providing vendors with a “lab” of several virtual machines, protected by the vendor’s products. The evaluation then sets out to penetrate these virtual machines using MITRE ATT&CK framework TTPs that have been seen in the wild used by (in the case of Round 2) APT29. Vendor solutions are awarded various “detections” (such as whether they produced an alert, or logged telemetry) for each MITRE TTP in the test. In the Round 2 evaluation, two attacks were performed over two days, with each attack having 10 stages comprising 70 sub-steps. In total, 140 sub-steps were used in the test.

For example, an adversary may aim to achieve discovery on a system by enumerating process IDs (T1057), gathering the OS version (T1082) and looking for AV and firewall software (T1063). For each sub-step, a vendor may receive one or more detection awards, depending on what information was presented as part of the detection that was recorded in the vendor product. 

The detection awards have the potential to cause the most confusion for anyone trying to consume the MITRE evaluation results. In Round 2, MITRE is using a hierarchical award system for detections, which can roughly be understood as describing detections from “the richest” (at the top) to the “least rich” at the bottom. There are 13 possible categories, split into two types: “Main detection category” and “Detection Modifiers”. The full list appears on MITRE’s website here. However, the most important categories in terms of minimizing dwell time – the time between an attack taking place and a detection occurring – are the main categories Technique, Tactic and General, and the modifier categories “Alert” and “Correlated”. From a defender’s point of view, “Alerts” (priority notifications) are crucial as they can decrease the dwell time rapidly, particularly if the solution has automated response capabilities.

The Technique, Tactic and General categories not only indicate a tool’s ability to detect an attack autonomously (and without human analysis delay) but also serve to indicate how ‘enriched’ the data is. MITRE awards the ‘Technique’ category to a tool when it provides rich data that answers the question of precisely what was done and why. The category of ‘Tactic’ is awarded when the tool provides sufficient information to answer the question of why the detection took place (e.g, a process set up persistence), and ‘General’ is awarded if the tool identifies malicious or abnormal behaviour but without sufficient enrichment to answer either the ‘how’ or the ‘why’ questions.

What’s New in the MITRE ATT&CK 2019 Evaluation?

As noted above, for Round 2 MITRE has refined the detection categories since Round 1 and also chosen TTPs associated with APT29, a Russian state-sponsored threat actor with a history of targeting Western, Asian, African and Middle Eastern governments and organizations. Their recent activity has tended to fall into large-scale “smash-and-grab” spear-phishing campaigns that attempt to exfiltrate as much data as possible, and smaller targeted campaigns with a focus on stealth and persistence. The MITRE attack emulation aims to represent the first kind of campaign on Day One and the second on Day Two.

In terms of the actual TTPs that are in scope for Round 2, there is some overlap from Round 1. In the image below, yellow represents TTPs that are in scope for the first time with Round 2, purple those that were also in scope during Round 1, and red are TTPs from the earlier round that are no longer in scope. Being ‘in scope’ doesn’t mean the TTP will actually be used, only that it may be included in the attack by the testers.

Why Does the MITRE ATT&CK Evaluation Matter?

There are two general problems for enterprise when evaluating any security solution or tool. First: how can you be confident that it will work during a real attack? Second: how will it work during an attack? What responses will your SOC or IT team see? What will they need to do, and what should they be looking out for? 

Testing security solutions has long-been problematic and ill-suited to determining real-world capability. From the original EICAR test to the dedicated third-party testing labs that have been around for some years now, there’s always been a strong disconnect between the artificial test and real-world efficacy. Vendors themselves have long been aware that their customers need both reassurance and training with their products, and they naturally set out to showcase their solutions in situations that best suit their own strengths.

What MITRE brings to the table is unique. First, the evaluation provides independent, non-partisan, and open test criteria and results. Importantly, the test does not seek to rank or judge vendor products against one another. The aim is to show how the product responds to specific stages of an attack. This helps enterprise users understand how the product they have adopted or may be considering adopting is likely to perform in the real world.

Second, with some caveats that we’ll note in a moment, it’s as close to a real-world experience as anything else currently available. By chaining together observed, in-the-wild TTPs and applying these in phases that emulate the behavior of an entire attack lifecycle, consumers get a far richer insight into how a product will perform than they can from testing against a compendium of known and unknown malware samples.

That said, it must be understood that the MITRE ATT&CK evaluation is still an emulation of an attack in artificial conditions. Note, for example, that the lab environment used in the test has no real (or simulated) user activity. The attack is the only ‘noise’ in the environment, and that often makes a big difference as to how a security solution really performs in action. Secondly, the attack only emulates a limited number of TTPs from MITRE’s ATT&CK framework (see the previous section for which TTPs are in scope) and for that reason cannot be considered a way to measure a particular tool’s depth of coverage or behavior against TTPs that were not in scope for this particular test.


With those caveats in mind, however, we eagerly await MITRE publishing the Round 2 results in the near future. It is our hope that this blog post will enrich not only your understanding of MITRE’s ATT&CK framework but also highlight how you can use its evaluation results to inform your understanding of the vendors’ products included in this year’s round. Our foundational pointer for absorbing Round 2: follow the hierarchy of detections to understand each product’s capabilities.