A snapshot of your computer with dd, pv and gzip - Part 1

Tue 18 June 2013
By Stephen Cripps

In GNU\Linux

tags: LinuxGNUbackupbash

Subscribe: RSS

Flattr this

The amount of time it takes to back up and restore a computer depends on the amount of effort you want to go into. There is a massive selection of software that will attempt to guide you through the process.

I have been using the utilities that come with a GNU/Linux operating system for this task for their brazen simplicity and reliability. In this post, I want to show you how to take a complete bit for bit copy of your hard drive that can be restored at any time.

There are some caveats. When you restore, it needs to be on a hard drive of the same size or larger, and will overwrite anything on the drive. The benefit is that even if you absolutely ruin the data on your drive, restoration will include everything, including the boot sector, so on start-up it will be as if nothing ever happened.

Anyways, on to the commands.

What you may have already seen

The target drive must not contain the operating system you want to boot up, so booting from a live-cd is recommended, unless you are dual booting from a completely separate drive or something.

> sudo dd if=/dev/sda of=/media/myExternalDrive/myBackup.img

dd is the key here, often jokingly refered to as the "Disk Destroyer," don't play around with this unless you know what your doing, especially with root permissions (see the wikipedia article for some history). This will accomplish our goal in the most inefficient manner possible. But lets go over this a little bit:

if=/dev/sda: is the special file that represents your hard drive in its most raw form. There are many ways to determine which one you want, a simple method is to open it in a graphical file manager, then go to the command line and type mount to see a list of mounted drives.

of=...: is the path to the output file. The file doesn't really have a specific file format, I just append the .img at the end for my own reference.

This has some pretty drastic drawbacks. First, there is no indication of your current progress through the operation. You just sit and watch the cursor blink and maybe watch the output file grow in the file manager. Furthermore the image is going to contain empty information, the free space on the disk which are just zero's.

A step further

I now want to include two commands into the previous one. We are going to do this by piping the output from dd into the following:

> dd if=/dev/sda | pv | gzip --fast > /media/myExternalDrive/myBackup.img

pv: Pipe viewer, will provide information about the data as it passes through the commands. Unfortunately, this command does not seem to come standard with most installations and you might need to install it. Don't worry though, a live cd will still allow you to install this.

gzip: Compression, will eliminate the zero data from the image.

This is better, it will show how fast the data is going through the pipe and keep the image down to the size of the actual data (actually, there is one more caveat with this, read on for more).

However pv has no idea how much data its expecting to pass through it and so it can't guess how long the process is going to take. Furthermore, even with the --fast option, gzip can be a bottleneck since it only runs on a single core.

Figuring how the data size

For this I use the tool parted. I will use it to determine the size of the entire drive in Bytes.

sudo parted /dev/sda
(parted) unit B
(parted) print

Model: ATA Corsair Force SS (scsi)
Disk /dev/sda: 60022480896B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start     End           Size          Type     File system
 1      1048576B  60021538815B  60020490240B  primary  ext4

The command output shows the results for my laptops hard drive, the number I'm interested in is the 60022480896B, size of my hard drive in Bytes. We take this and give it to the -s argument for pv without the B at the end (pv expects the number to be bytes).

Which now gives us:

> dd if=/dev/sda | pv -s 60022480896 | \
    gzip --fast > /media/myExternalDrive/myBackup.img

(Note: the \ simply tells bash to continue on the next line)

And now pv will tell you how long the transfer is going to take. Now lets take care of the gzip issue.

Speed up compression

I recently came across a utility called pigz (pronounced pig-zee), which creates completely compatible gzip archives whilst using multiple threads. The utility had no problem maxing out my cpu for the entire duration of the operation and sped up the entire process immensely. Check out the homepage here for more info.

So finally we have:

> dd if=/dev/sda | pv -s 60022480896 | \
    pigz --fast > /media/myExternalDrive/myBackup.img

Some finishing touches

I really don't like using /dev/sda to specify the hard drive, since it is possible for this to change. Rather try using:

/dev/disk/by-id/ata-Corsair_Force_SSD...

Which gives you a unique identifier for each disk that should never change. (I believe it doesn't change across systems either, but I'll have to check.)

Also, remember when I said that gzip will eliminate the useless data? Well it will, but only if it actually sees null values. The empty space on your hard drive might be filled with random data (especially if you use encryption). If you want to make sure the free space is actually zero you could do:

> dd if=/dev/zero of=/path/to/somewhere.img && \
    rm /path/to/somewhere

Which would create a file containing only zeros, which will get as big as there is free space available, and then removes it. It takes a while since it actually forces the drive to write zeros to the disk.

Restoration

You reverse the process. Use a similar command to decompress the file, pipe it to pv, then dd which writes it back to the disk, such as:

pigz -d /media/myExternalDrive/myBackup.img | \
    pv -s 60022480896 | dd of=/dev/sda

The -d option specifies decompression, and is the same for gzip. Note that you always want the decompressed version to be piped through pv since thats what the number of bytes actually represents. I suppose you could use a different number for decompression corresponding to the image size and placing decompression after pv.

Improvements

I'm pretty certain it would be possible to mount the file system from the raw image file with losetup and some other tools, which would make this strategy for backup way more useful.

Let me know if you found this interesting, feel free to ask questions in the comments.

Comments !

blogroll