Vault Data Backup Meet DNS Hijacking

Posted on Jul 17, 2024 | By Andrei Buzoianu | 10 minutes read

Introduction

Recently, I ventured to conquer the task of backing up a client’s Vault deployment. And let’s just say, it involved a lot of backup chatter, but in summary the game plan was simple: try to defend the backup software (in this case a set of scripts), specifically the configuration files, and obviously, the backup data. It turns out, saving backups on something that is immutable, and as unchangeable as my stubborn kid not only shields them backups from pesky invaders, but also requires a clever approach to recovery if we only need to restore a subset of files. Also, in addition to implementing protocols to prevent future cyber attacks, it is crucial to verify that the restoration process does not result in the reinstallation of the very malware that initiated the attack. Don’t want to invite back the party guest who brought the malware, am I right?

Enough about that, my biggest surprise was doing a simple snapshot. Who would have thought that out of all the things that could go wrong, it would be the trusty little vault operator raft snapshot save that would throw a wrench in my plans?

The bug

Nowadays, safeguarding against data loss should be implicit from the design stages. It often isn’t. As a standard operating procedures (SOP), backing up a Vault cluster should be as simple as running:

vault operator raft snapshot save backup.snap

The exact steps needed to be undertaken when doing a backup differs dependent upon our Vault architecture. The assumption here is that we are trying to do a manual backup of a single Vault Cluster (Integrated Storage). The documentation mentions, and I quote, “this will take the snapshot using a consistent mode that forwards the request to the cluster leader, and the leader will verify it is still in power before taking the snapshot”.

Let’s get acquainted with Vault issue 15258. Ah, the infamous bug, one creature capable of striking fear into the hearts of even the bravest of programmers. With its harmless appearance, internally code named VAULT-4568 (it seems), this little bugger ate my time and made me unhappy.

Apparently, if one uses a load-balancer that is able to communicate with any Vault unsealed node, or issues the backup command from a non-leader node, the operation ultimately results in failure. When issuing the snapshot command from the leader node, the command succeeds, and a backup is taken without any issues.

The Solution

When functioning within the Vault Cluster concept, our operation requires a minimum of three nodes, with one designated as the leader. Assuming we are using a load-balancer and VAULT_ADDR is set to https://vault.domain.tld:8200, we can use a backup script to achieve our goal:

$ export VAULT_ADDR=https://vault.domain.tld:8200
$ export VAULT_TOKEN=hvs.************************
$ /opt/bin/vaultbackup

The only relevant line in the script is the backup command:

vault operator raft snapshot save backup.snap

In the event of a snapshot request received by a non-leader, the backup process will not succeed. But, if it were possible to exclusively redirect our requests to the leader, it would enable our backup to function.

Hold on a second… We can totally do this! We are able to query the Vault API and obtain the IP address of the current leader. The remaining task is to alter the variable VAULT_ADDR.

We are faced with two options:

The less elegant solution is to run vault with --tls-skip-verify, if we use something like VAULT_ADDR=https://192.168.21.7:8200. By default, this will fail, with an error similar to: certificate subject name ‘VAULT.DOMAIN.TLD’ does not match target host name ‘192.168.21.7’. Using --tls-skip-verify will work (we are basically telling Vault not to care about the certificate), but this opens up a lot of man-in-the-middle opportunities and defeats the purpose regarding the security assumptions we make when running a Vault Secrets Management Cluster.
The more elegant solution is to hijack DNS resolution by tricking the vault process into thinking that vault.domain.tld resolves to 192.168.21.7, our leader IP. This way we still get encryption in transit, and vault.domain.tld will work with a valid certificate.

Therefore, let’s do DNS hijacking and have some fun. Before we dive into it, how do we extract the leader IP?

Get the Vault leader IP

I mentioned we can get and use the leader IP address. The leader status can be retrieved by accessing the /sys/leader endpoint. This will return information about the high availability status and the current leader of Vault:

$ curl https://vault.domain.tld:8200/v1/sys/leader
{
  "ha_enabled": true,
  ...
  "leader_address": "https://192.168.21.7:8200/",
  ...
  "last_wal": 0,
  "raft_committed_index": 0,
  "raft_applied_index": 0
}

The following snippet of code extracted from the aforementioned /opt/bin/vaultbackup script does exactly what we need to get the leader IP:

$ cat /opt/bin/vaultbackup
LEADER=$(curl --silent --header "X-Vault-Token: $VAULT_TOKEN" https://vault.domain.tld:8200/v1/sys/leader | jq .leader_address | awk -F[/:] '{print $4}')

If we run it:

$ /opt/bin/vaultbackup
192.168.21.7

DNS Hijacking

DNS hijacking, also known as DNS poisoning, or when the internet fairy decides to redirect your DNS queries for her own amusement. It is essentially a DNS redirection or subverting the resolution of Domain Name System queries. In order to effectively resolve our initial issue, it is necessary to devise a method to deceive the vault process into thinking that vault.domain.tld resolves to 192.168.21.7. Now what’s the witty strategy here?

In my digital mining operations, I stumbled upon this hidden gem.

tmpdir=$(mktemp -d)
echo '192.168.21.7 vault.domain.tld' >$tmpdir/hosts

unshare -Urm bash <<END
    mount --bind $tmpdir /etc
    ping -c1 vault.domain.tld
END
rm -rf "$tmpdir"

We use a Linux namespace to create an isolated space specific to the process under which Vault runs, when we take a snapshot. A namespace encapsulates a system resource within an abstraction, creating the illusion to processes within the namespace that they have their own isolated instance of the global resource. To run vault in a new namespace, we use unshare.

The unshare command creates a new namespace, as specified by the command-line options, and then executes the specified program. Because DNS resolution is done on *nix operating systems by first checking what we have in /etc/hosts, in an isolated namespace we mount a special /etc/hosts in which the entry is leader_ip vault.domain.tld.

In essence, we achieve this by populating a temporary file with the necessary data and exposing that file as /etc/hosts for our process.

If we amend the script and include this code, the output will be:

$ /opt/bin/vaultbackup
PING vault.domain.tld (192.168.21.7) 56(84) bytes of data.
64 bytes from vs1.domain.tld (192.168.21.7): icmp_seq=1 ttl=64 time=0.016 ms

--- vault.domain.tld ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.016/0.016/0.016/0.000 ms

Now all we need to do is to replace the ping -c1 vault.domain.tld command with the actual vault backup command.

Conclusion

I am not an expert in security by any means or standards, therefore, I advise you to take this information with a grain of salt. The method was successful in my situation and I thoroughly enjoyed discovering this solution. Maybe DNS hijacking can be used for good! Then again:

It’s not DNS

There’s no way it’s DNS

It was DNS