The Art and Science of macOS Malware Hunting with radare2 | Leveraging Xrefs, YARA and Zignatures

Welcome back to our series on macOS reversing. Last time out, we took a look at challenges around string decryption, following on from our earlier posts about beating malware anti-analysis techniques and rapid triage of Mac malware with radare2. In this fourth post in the series, we tackle several related challenges that every malware hunter faces: you have a sample, you know it’s malicious, but

  • How do you determine if it’s a variant of other known malware?
  • If it is unknown, how do you hunt for other samples like it?
  • How do you write robust detection rules that survive malware author’s refactoring and recompilation?

The answer to those challenges is part Art and part Science: a mixture of practice, intuition and occasionally luck(!) blended with a solid understanding of the tools at your disposal. In this post, we’ll get into the tools and techniques, offer you tips to guide your practice, and encourage you to gain experience (which, in turn, will help you make your own luck) through a series of related examples.

As always, you’re going to need a few things to follow along, with the second and third items in this list installed in the first.

  1. An isolated VM – see instructions here for how to get set up
  2. Some samples – see Samples Used below
  3. Latest version of r2 – see the github repo here.

What are Zignatures and Why Are They Useful?

By now you might have wondered more than once if this post just had a really obvious typo: Zignatures, not signatures? No, you read that right the first time! Zignatures are r2’s own format for creating and matching function signatures. We can use them to see if a sample contains a function or functions that are similar to other functions we found in other malware. Similarly, Zignatures can help analysts identify commonly re-used library code, encryption algorithms and deobfuscation routines, saving us lots of reversing time down the road (for readers familiar with IDA Pro or Ghidra, think F.L.I.R.T or Function ID).

What’s particularly nice about Zignatures is that you can not only search for exact matches but also for matches with a certain similarity score. This allows us to find functions that have been modified from one instantiation to the other but which are otherwise the same.

Zignatures can help us to answer the question of whether an unknown sample is a variant of a known one. Once you are familiar with Zignatures, they can also help you write good detection rules, since they will allow you to see what is constant in a family of malware and what is variant. Combined with YARA rules, which we’ll take a look at later in this post, you can create effective hunting rules for malware repositories like VirusTotal to find variants or use them to help inform the detection logic in malware hunting software.

Create and Use A Zignature

Let’s jump into some malware and create our first Zignature. Here’s a recent sample of WizardUpdate (you might remember we looked at an older sample of WizardUpdate in our post on string decryption).

Loading the sample into r2, analyzing its functions, and displaying its hashes

We’ve loaded the sample into r2 and run some analysis on it. We’ve been conveniently dropped at the main() function, which looks like this.

WizardUpdate main() function

That main function contains some malware specific strings, so should make a nice target for a Zignature. To do so, we use the zaf command, supplying the parameters of the function name and the signature name. Our sample file happened to be called “WizardUpdateB1”, so we’ll call this signature “WizardUpdateB1_main”. In r2, the full command we need, then, is:

> zaf main WizardUpdate_main

We can look at the newly-created Zignature in JSON format with zj~{} (if you’re not sure why we’re using the tilde, review the earlier post on grepping in r2).

An r2 Zignature viewed in JSON format

To see that the Zignature works, try zb and note the output:

zb returns how close the match was to the Zignature and the function at the current address

The first entry in the row is the most important, as that gives us the overall (i.e., average) match (between 0.00000 and 1.00000). The next two show us the match for bytes and graph, respectively. In this case, it’s a perfect match to the function, which is of course what we would expect as this is the sample from which we created the rule.

You can also create Zignatures for every function in the binary in one go with zg.

Create function signatures for every function in a binary with one command

Beware of using zg on large files with thousands of functions though, as you might get a lot of errors or junk output. For small-ish binaries with up to a couple of hundred functions it’s probably fine, but for anything larger than that I typically go for a targeted approach.

So far, we have created and tested a Zignature, but it’s real value lies in when we use the Zignature on other samples.

Create A Reusable and Extensible Zignatures File

At the moment, your Zignatures aren’t much use because we haven’t learned yet how to save and load Zignatures between samples. We’ll do that now.

We can save our generated Zignatures with zos <filename>. Note that if you just provide the bare filename it’ll save in the current working directory. If you give an absolute path to an existing file, r2 will nicely merge the Zignatures you’re saving with any existing ones in that file.

Radare2 does have a default address from which it is supposed to autoload Zignatures if the autoload variable is set, namely ~/.local/share/radare2/zigns/ (in some documentation, it’s ~/.config/radare2/zigns/) However, I’ve never quite been able to get autoload to work from either address, but if you want to try it, create the above location and in your radare2 config file (~/.radare2rc) add the following line.

e zign.autoload = true

In my case, I load my zigs file manually, which is a simple command: zo <filename> to load, and zb to run the Zignatures contained in the file against the function at the current address.

Sample WizardUpdate_B2’s main function doesn’t match our Zignature

Sample WizardUpdate_B5’s main function is a perfect match for our Zignature

As you can see, the Sample above B5 is a perfect match to B1, whereas B2 is way off with the match only around 46.6%.

When you’ve built up a collection of Zignatures, they can be really useful for checking a new sample against known families. I encourage you to create Zignatures for all your samples as they will pay dividends down the line. Don’t forget to back them up too. I learned the hard way that not having a master copy of my Zigs outside of my VMs can cause a few tears!

Creating YARA Rules Within radare2

Zignatures will help you in your efforts to determine if some new malware belongs to a family you’ve come across before, but that’s only half the battle when we come across a new sample. We also want to hunt – and detect – files that are like it. For that, YARA is our friend, and r2 handily integrates the creation of YARA strings to make this easy.

In this next example, we can see that a different WizardUpdate sample doesn’t match our earlier Zignature.

The output from zb shows that the current function doesn’t match any of our previous function signatures

While we certainly want to add a function signature for this sample’s main() to our existing Zigs, we also want to hunt for this on external repos like VirusTotal and elsewhere where YARA can be used.

Our main friend here is the pcy command. Since we’ve already been dropped at main()’s address, we can just run the pcy command directly to create a YARA string for the function.

Generating a YARA string for the current function

However, this is far too specific to be useful. Fortunately, the pcy command can be tailored to give us however many bytes we wish at whatever address.

We know that WizardUpdate makes plenty of use of ioreg, so let’s start by searching for instances of that in the binary.

Searching for the string “ioreg” in a WizardUpdate sample

Lots of hits. Let’s take a closer look at the hex of the first one.

A URL embedded in the WizardUpdate sample

That URL address might be a good candidate to include in a YARA rule, let’s try it. To grab it as YARA code, we just seek to the address and state how many bytes we want.

Generating a YARA string of 48 bytes from a specific address

This works nicely and we can just copy and paste the code into VT’s search with the content modifier. Our first effort, though, only gives us 1 hit on VirusTotal, although at least it’s different from our initial sample (we’ll add that to our collection, thanks!).

Our string only found a single hit on VirusTotal

But note how we can iterate on this process, easily generating YARA strings that we can use both for inclusion and exclusion in our YARA rules.


This time we had better success with 46 hits for one string

This string gives us lots of hits, so let’s create a file and add the string.

pcy 32 >> WizardUpdate_B.yara
Outputting the YARA string to a file

From here on in, we can continue to append further strings that we might want to include or exclude in our final YARA rule. When we are finished, all we have to do is open our new .yara file and add the YARA meta data and conditional logic, or we can paste the contents of our file into VTs Livehunt template and test out our rule there.

Xrefs For the Win

At the beginning of this post I said that the answer to some of the challenges we would deal with today were “part Art and part Science”. We’ve done plenty of “the Science”, so I want to round out the post by talking a little about “the Art”. Let’s return to a topic we covered briefly earlier in this series – finding cross-references in r2 – and introduce a couple of handy tips that can make development of hunting rules a little easier.

When developing a hunting or detection rule for a malware family, we are trying to balance two opposing demands: we want our rule to be specific enough not to create false positives, but wide or general enough not to miss true positives. If we had perfect knowledge of all samples that ever had been or ever would be created for the family under consideration, that would be no problem at all, but that’s precisely the knowledge-gap that our rule is aiming to fill.

A common tip for writing YARA rules is to use something like a combination of strings, method names and imports to try to achieve this balance. That’s good advice, but sometimes malware is packed to have virtually none of these, or not enough to make them easily distinguishable. On top of that, malware authors can and do easily refactor such artifacts and that can make your rules date very quickly.

A supplementary approach that I often use is to focus on code logic that is less easy for author’s to change and more likely to be re-used.

Let’s take a look at this sample of Adload written in Go. It’s a variant of a much more prolific version, also written in Google’s Golang. Both versions contain calls to a legit project found on Github, but this variant is missing one of the distinctive strings that made its more widespread cousin fairly easy to hunt.

A version of Adload that calls out to a popular project on Github

However, notice the URL at 0x7226. That could be interesting, but if we hit on that domain name string alone in VirusTotal we only see 3 hits, so that’s way too tight for our rule.

Your rules won’t catch much if your strings are too specific
Let’s grab some bytes immediately after the C2 string is loaded

We might do better if we try grabbing bytes of code right after that string has been loaded, for while the API string will certainly change, the code that consumes it perhaps might not. In this case, searching on 96 bytes from 0x7255 catches a more respectable 23 hits, but that still seems too low for a malware variant that has been circulating for many months.

Notice the dates – this malware has probably far more than just 23 samples

Let’s see if we can do better. One trick I find useful with r2 is to hunt down all the XREFs to a particular piece of code and then look at the calling functions for useful sequences of byte code to hunt on.

For example, you can use sf. to seek to the beginning of a function from a given address (assuming it’s part of a function, of course) and then use axg to get the path of execution to that function all the way from main(). You can use pds to give you a summary of the calls in any function along the way, which means combining axg and pds is a very good way to quickly move around a binary in r2 to find things of interest.

Using the axg command to trace execution path back to main

Now that we can see the call graph to the C2 string, we can start hunting for logic that is more likely to be re-used across samples. In this case, let’s hunt for bytes where sym.main.main calls the function that loads the C2 URL at 0x01247a41.

Finding reusable logic that should be more general than individual strings

Grabbing 48 bytes from that address and hunting for it on VT gives us a much more respectable 45 TP hits. We can also see from VT that these files all have a common size, 5.33MB, which we can use as a further pivot for hunting.

Our hunt is starting to give better results, but don’t stop here!

We’ve made a huge improvement on our initial hits of 3 and then 23, but we’re not really done yet. If we keep iterating on this process, looking for reusable code rather than just specific strings, imports or method names, we’re likely to do much better, and by now you should have a solid understanding of how to do that using r2 to help you in your quest. All you need now, just like any good piece of malware, is a bit of persistence!

Conclusion

In this post, we’ve taken a look at some of r2’s lesser known features that are extremely useful for hunting malware families, both in terms of associating new samples to known families and in searching for unknown relations to a sample or samples we already have. If you haven’t checked out the previous posts in this series, have a look at Part 1, Part 2 and Part 3. If you would like us to cover other topics on r2 and reverse engineering macOS malware, ping me or SentinelLabs on Twitter with your suggestions.

Samples Used

File name SHA1
WizardUpdate_B1 2f70787faafef2efb3cafca1c309c02c02a5969b
WizardUpdate_B2 dfff3527b68b1c069ff956201ceb544d71c032b2
WizardUpdate_B3 814b320b49c4a2386809b0bdb6ea3712673ff32b
WizardUpdate_B4 6ca80bbf11ca33c55e12feb5a09f6d2417efafd5
WizardUpdate_B5 92b9bba886056bc6a8c3df9c0f6c687f5a774247
WizardUpdate_B6 21991b7b2d71ac731dd8a3e3f0dbd8c8b35f162c
WizardUpdate_B7 6e131dca4aa33a87e9274914dd605baa4f1fc69a
WizardUpdate_B8 dac9aa343a327228302be6741108b5279adcef17
Adload 279d5563f278f5aea54e84aa50ca355f54aac743