A Malware Analyst’s Guide to Bitcoin

Why Should You Care?

Criminals are using Bitcoin and other cryptocurrencies for handling payments for selling stolen data, hacking services such as DDoS, and for ransomware payments. If you want to follow the money and better understand threat actors, you’ll need to understand Bitcoin and how to analyze bitcoin transactions.

It’s important to keep up with the technology criminals are using so you are better equipped to understand new shifts and developments. Also with more people analyzing Bitcoin and the blockchain, there is more of an opportunity to develop new tools and techniques for defending against the bad guys.

Criminals have historically been early adopters of new technologies because innovations tend to be adopted first where friction is highest and there’s an incentive. Criminals were among the first to figure out creative uses for cars (while most police were using bicycles or horses), mobile phones and beepers, and they’re of course pioneering ways of exploiting the Internet, Tor, encryption, and now Bitcoin. If criminals are first to adopt a technology, then those who try to understand and thwart criminals should be a very close second.

Bitcoin Terminology for a Malware Analyst

Before I talk about analysis, I want to make sure you’re familiar with some terms and basic concepts in Bitcoin. It’s remarkably easy to grok Bitcoin if you’re already familiar with common InfoSec concepts like public-key cryptography and hashing. This section is just an overview. If you want a deeper understanding, I encourage you to read the original Bitcoin paper (it’s only 9 pages!).

Bitcoin is essentially a public ledger — it keeps track of who owns what. The ledger itself is called the blockchain. The blockchain is special because, unlike a publicly shared spreadsheet, you can only add to the blockchain if you follow the rules. The rules were created with an understanding of game theory that are strictly enforced by math to ensure fairness. Also, it’s decentralized, meaning no central authority controls the network and it’s not possible to subvert the network with coercion of a single entity. The blockchain uses hashes of public keys rather than people’s names to record ownership. In this way, it’s somewhat anonymous though there are ways of correlating addresses and some limited ways of mapping addresses to actual identities.

Mining

New bitcoins are created by a process called mining. This tends to be the most interesting and mysterious process. After all, by mining a block the miner is rewarded with 25 Bitcoin. At current market rates, that is worth over $25,000 USD. This usually gets people’s attention. A block is a collection of bitcoin transactions (records of people sending bitcoins), a hash of the previous block, and a random number (nonce) that when hashed produces a value which starts with some number of 0 bits. For example, let’s say you have a block which looks like this:

caleb sends joe 1.2 btc
udi sends joe 0.3 btc
joe sends aidan 1.5 btc
nonce: 0

If you were to save this to a file (no newline at the end) and calculate the SHA256 hash, the hash would be: b5b6ee30fae42aa131a84f705e5d7cf59133b3954e53c79b05c13a328b8d6f8a. You can check yourself (on a Mac) with shasum -a 256 block.txt. The first byte of this example block is 0xb5 which in binary is 10110101. This has zero leading 0 bits. If you increment the nonce to 4, the block’s hash becomes 01bdf748aeb6443595d1d29fc348418f4b4b2bbe5287c53e892735f467702308 and the first byte is 00000001 which has 7 leading 0 bits.

If the target number of 0 bits was really high, say 50 bits, you’d have no way of knowing a priori which nonce would work! You’d have to just hash many, many blocks with an incrementing nonce until you found a nonce value that gave the target number of leading 0 bits. When a Bitcoin miner creates a valid block (hash has enough leading 0 bits), they announce it to the network and it proves that the miner had to work very hard. This is why mining is said to use proof of work because finding a valid block takes a lot of hashing and if you have one, it proves you did the work to find it.

Once a transaction is included in a valid block, it becomes part of the blockchain. It becomes truth. At any given moment there are thousands of unconfirmed transactions waiting to be included in a freshly mined block. You can see this list of them here: Unconfirmed Transactions. The Bitcoin protocol automatically adjusts the target difficulty so that a block is mined, on average, every 10 minutes.

If you’d like to play around with mining yourself, here’s a Python script which simulates mining. You can adjust the difficulty by increasing the TARGET value.

#!/usr/bin/env python

import sys
import hashlib

# Difficulty target (number of leading 0 bits)
TARGET = 16

def generate_blocks(transactions):
  block_base = 'n'.join(transactions)
  nonce = 0
  while nonce < sys.maxint:
    yield '{}nnonce: {}'.format(block_base, nonce)
    nonce += 1

def hash_block(block):
  h = hashlib.sha256()
  h.update(block)
  return h.digest()

def digest_to_binstr(digest):
  return ''.join(map(lambda o: format(o, '08b'), map(ord, digest)))

transactions = [
  'caleb sends joe 1.2 btc',
  'udi sends joe 0.3 btc',
  'joe sends aidan 1.5 btc',
]

target_str = '0' * TARGET
attempts = 0
for block in generate_blocks(transactions):
  attempts += 1
  digest = hash_block(block)
  binstr = digest_to_binstr(digest)
  if binstr[0:TARGET] == target_str:
    print("Mined block with {} difficulty after {} attempts!n{}nHash: {}".format(TARGET, attempts, block, binstr, binstr))
    break

Here’s what it looks like when executed:

$ ./fake-mine.py
Successfully mined block with 16 difficulty after 78169 attempts!
caleb sends joe 1.2 btc
udi sends joe 0.3 btc
joe sends aidan 1.5 btc
nonce: 78168
Hash: 0000000000000000000100011000110010001011110101011000101000011000101000101101001000100010110110000011010001000110101101110111110100110001100100011001000111001011111100111110111100101110010001110101100000000110001101101111101001101110011000000001101100110000

With a target difficulty of 16, it took over 78,000 attempts before it found a nonce that produced a valid block.

Transactions

Bitcoins are all owned by an address which is derived from a public key. Sending Bitcoin really just means to reassign them to another public key and this is done by broadcasting a transaction. A transaction includes a hash of both the destination public key and one or more hashes of coins you own (previous transactions). These two items are signed with your private key to prove you own them.

An important detail to security researchers is that transactions are broadcast over the internet. If you want to map identities to public keys spending coins, having the IP address of the client which sent the coins could be very informative. Sites like Blockchain.info include the IP address which a transaction was relayed by within the peer-to-peer network, but this is only a very rough proxy of the IP of the actual client, and it may not even correlate to the same country as the client. For research on actually getting the IP address, check out Deanonymisation of Clients in Bitcoin P2P Network.

Mining Pools

Solo mining in Bitcoin is difficult because unless you’ve invested a lot of money into equipment, you’re likely to never mine a block and all of your effort to do so would be wasted. Mining pools have developed to allow smaller miners to combine their hashing power. If someone mines a block, the reward is usually distributed in proportion to how much each miner contributed. Contributions are measured by recording the number of times a miner solves a block with a lesser target difficulty than Bitcoin.

Malware that mines Bitcoin usually does so by contributing to a pool. However, mining Bitcoin is much less common than mining some other cryptocurrency such as Monero because Bitcoin mining is too difficult to be profitable. Also, Monero has the benefit of being designed to be much more anonymous than Bitcoin.

If malware is connecting to a pool, you should be able to identify the pool’s web address and the worker credentials. The worker credentials are used by the pool to know which address should receive payouts. Each pool is a little different, but some use payout addresses for worker usernames. You want to get the address the malware is using if possible because it allows you to do some extra analysis.

Analyzing a Bitcoin Address

If you can get your hands on a Bitcoin address associated with a malware campaign, you can watch it for activity. If there’s activity, you know there are still machines getting infected. If the address has no activity for a long time and then starts having activity again, it implies there was a new campaign and you should start directing resources into finding new samples. The amount of activity on the address also gives you some idea of how successful the malware is and how many machines it infected.

A real example of an analysis of an address associated with ransomware is in our report of CryptXXX: New CryptXXX Variant Discovered. The ransomware payment address was 18e372GNwjGG5SYeHucuD1yLEWh7a6dWf1. According to malware’s ransom note, payment must be 1.2 bitcoin or some multiple depending on how long it takes the victim to pay. By knowing this, we can look at the transactions to the ransom address and see the number and amount of payments to this address. If we had been monitoring the IP addresses of transactions, we could’ve also had an idea of the geographic distribution of victims, which could give some insight into who was being targeted. With a better idea of who’s targeted and why, it’s easier to configure and deploy honeypots to try and collect new samples.

Since all payments to this address happened within about two months, it can be assumed the address was for a single campaign and there must be other addresses associated with previous and future campaigns. If you monitor the blockchain and are clustering associated addresses, you may find new payment addresses before you find samples. This would help a malware analyst narrow their search to collect new samples for analysis and to ensure detection coverage. There aren’t any free, turn-key address clustering tools out there, but it is an active field of research. For more information, check out The Unreasonable Effectiveness of Address Clustering and BitSniffer.

Tracking the flow of bitcoin is made more difficult by the use of tumblers. These are services which take bitcoin from many users, shuffle them around many times to lots of different addresses, and then redistribute them back to the original owners. Again, there are no point-and-click tools to unravel the tangled mess created by tumblers, but there is research on the topic. For more information, read Survey of Bitcoin Mixing Services: Tracing Anonymous Bitcoins.

I’ve personally found some interesting Bitcoin related malware simply by searching for strings which look like addresses or mining pool connection strings. When these types of searches are combined with other types of static analysis, it can make for an interesting source of potential malware. This presentation gives some good examples of hunting for Bitcoin related strings and artifacts and even comes with Yara rules: Tracing Bits of Coins in Disk and Memory.

Summary

Hopefully after reading this you have a better understanding of how Bitcoin works and you are armed with some new analysis techniques for the next ransomware or cryptocurrency-related malware you tear apart. I also hope that this introduction to various Bitcoin analysis and forensics research inspires you to create some new analysis tools and techniques which could help everyone in the industry keep up with the bad guys.

To learn more about the latest threats and how to protect against them, subscribe to our blog.