Archiving and Compressing Data
Archiving and Compressing Data This subject can fall into the general “housekeeping” of your files.
Archiving with tar
Archiving with tar Creating a file archive (tarfile) can help you organize your files for storage or transfer.
You can put all of your related files and store them as one larger separate file called a “tarfile.”
Archiving with tar You can use a file archive (tarfile) when you: want to keep files for later reference but are done with them at the moment. want to compress groups of files for storage or transfer to other hosts.
Archiving with tar Tar (tape archive) is one UNIX command we use to accomplish this. Tar can also send files directly to a magnetic tape but for our purposes we will use it to make file archives. We used this the first day to extract your UNIX_class subdirectory structure.
1
Archiving with tar
tar Practice Start by moving into your UNIX_class subdirectory.
Creating a tarfile Suppose we want to archive a directory. Let’s take dir2 from our UNIX_class examples. It is always a good idea to verify the name of the directory you want to archive.
tar Practice
tar Practice
userid userid userid userid userid userid
512 512 512 512 512 512
Jun Jun Jun Jun Jun Jun
28 18 18 24 24 28
17:52 21:31 20:10 15:38 13:55 15:19
Animal Shakespeare Wildcards dir1 dir2 dir3
dir2/ 0K dir2/.DS_Store 7K dir2/address_list 1K dir2/final.paper 1K dir2/history.txt 7K dir2/picts/ 0K dir2/picts/unixbutton.JPG 24K dir2/cats/ 0K dir2/cats/catsup/ 0K dir2/cats/cathode/ 0K dir2/cats/caterpillar/ 0K dir2/cats/caterpillar/butterfly 1K dir2/cats/caterpillar/larva 1K dir2/cats/catalyst 1K
tar Practice
If we list our directory we should see the dir2.tar at the same level in hierarchy as dir2. userid userid userid userid userid userid userid
userid userid userid userid userid userid
We will keep the example simple and create the archive file in the same directory.
a a a a a a a a a a a a a a
$ tar –cvf dir2.tar dir2
2 2 2 2 4 1 4
2 2 2 2 4 4
tar Practice
Type:
$ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x -rw-rw-r-drwxrwxr-x
$ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x
userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun
28 18 18 24 24 8 28
17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3
You will see that you have a file named dir2.tar and it is not a directory, but a regular file. $ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x -rw-rw-r-drwxrwxr-x
2 2 2 2 4 1 4
userid userid userid userid userid userid userid
userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun
28 18 18 24 24 8 28
17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3
2
Archiving with tar What we did here is create a tarfile called dir2.tar which is a file archive of all the files and directories under dir2.
Archiving with tar $ tar –cvf filename.tar directoryname
The options: -c tells tar to “create”
Lets look at tar more closely. There are many options and different ways to use the command, but we will focus on the ones you will use most often.
Archiving with tar $ tar –cvf filename.tar directoryname
The options: -v tells tar to be verbose in the output it provides. This is not necessary, but it is helpful to see what is going on. It tells you what you are doing (appending or extracting) and the file names and sizes.
Archiving with tar $ tar –cvf filename.tar directoryname
Archiving with tar $ tar –cvf filename.tar directoryname
The options: -f tells tar that we are using a tarfile and the next [operand] has to specify the name of the tarfile. If we did not specify the -f option then tar would assume that your files are going to tape.
Archiving with tar $ tar –cvf filename.tar directoryname
We use certain things in UNIX by convention and not by any rule.
We use certain things in UNIX by convention and not by any rule.
When we specify a name for our tarfile (file archive) it is good practice to use the .tar extension.
I also recommend, if you are tarring a whole directory as in our example, to have the new filename be the same as the original directoryname.
3
Archiving with tar
Archiving with tar
tar –cvf /home/jsmith/tarfiles/dir2.tar dir2 tar –cvf ~/tarfiles/dir2.tar dir2
tar –cvf /home/jsmith/tarfiles/dir2.tar dir2 | pathname tarfile original directory
Remember that the “tarfiles” directory has to already exist
If we wanted the tarfile to be stored somewhere else in your directory structure you could specify an absolute or relative pathname in front of the tarfile. It could be stored anywhere in which you have write permissions.
Extracting with tar
Extracting with tar
Extracting a tarfile: At some point in time you will probably need access to individual files that reside in your archive. You can extract the file individually or the whole archive. For the purpose of our example we will extract the contents of dir2.tar to another location.
Extracting with tar $ cd $ ls –l UNIX_class drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid
userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun
If we wanted the tarfile to be stored somewhere else in your directory structure you could specify an absolute or relative pathname in front of the tarfile. It could be stored anywhere in which you have write permissions.
First, from our home directory, let’s do a listing of our UNIX_class directory. $ cd (takes us to the top of our home directory) $ ls -l UNIX_class
Extracting with tar 28 18 18 24 24 8 28
17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3
We want to extract the tarfile to a different location. In this case, the top of our home directory.
When we extract our tarfile using this particular example, we need to be in the directory in which we want the files to end up. For more tar options, see the man page on tar.
4
Extracting with tar Make sure you are in your home directory (not in UNIX_class). $ pwd (always a good habit to get into)
Extracting with tar x x x x x x x x x x x x x x
dir2, 0 bytes, 0 tape blocks dir2/.DS_Store, 6148 bytes, 13 tape blocks dir2/address_list, 455 bytes, 1 tape blocks dir2/final.paper, 716 bytes, 2 tape blocks dir2/history.txt, 7091 bytes, 14 tape blocks dir2/picts, 0 bytes, 0 tape blocks dir2/picts/unixbutton.JPG, 24125 bytes, 48 tape blocks dir2/cats, 0 bytes, 0 tape blocks dir2/cats/catsup, 0 bytes, 0 tape blocks dir2/cats/cathode, 0 bytes, 0 tape blocks dir2/cats/caterpillar, 0 bytes, 0 tape blocks dir2/cats/caterpillar/butterfly, 325 bytes, 1 tape blocks dir2/cats/caterpillar/larva, 103 bytes, 1 tape blocks dir2/cats/catalyst, 174 bytes, 1 tape blocks
Extracting with tar Verify dir2 is there in your home directory: $ ls dir2
UNIX_class
Extracting with tar Type: $ tar –xvf UNIX_class/dir2.tar -x option means to extract.
Extracting with tar The verbose output tells you that it is extracting (the “x” at the beginning),and lists the file names and sizes.
Extracting with tar Remember that when you extract, your original tarfile is still there where you put it in the first place. You can extract it as many times as you like to any other locations in which you have write permissions.
5
Compressing and Uncompressing Compressing and Uncompressing
Compressing reduces the file size using a special encoding. Tarring files and compressing often go hand in hand. You can compress a whole tarfile without having to compress each individual file.
Compressing and Uncompressing This will be helpful if you have large directories that you’ve tarred and will not need for some time and you want to save some disk space.
Compressing and Uncompressing The compression command we recommend is gzip. It was designed to replace the older UNIX command simply called compress. It is more efficient and free, therefore it is widely supported on other platforms.
Compressing and Uncompressing As you could imagine, it will also save on upload/download times with ‘sftp’ if your file is compressed before sending it to another host.
Compressing and Uncompressing gzip is used to compress (zip) a file or files gunzip is used to expand (unzip) a file or files gzip produces files with a .gz extension
6
Compressing and Uncompressing Using gzip to compress a file.
Compressing Practice Let’s find our previous tarfile (dir2.tar) that should reside at the top of your UNIX_class directory and try this handy utility. $ cd UNIX_class $ ls -l
Compressing Practice $ cd UNIX_class $ ls –l drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid
Compressing Practice Type:
userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun
28 18 18 24 24 8 28
17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3
Compressing Practice $ gzip dir2.tar $ ls –l drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid
userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 29601 Ju1 userid 512 Jun
$ gzip dir2.tar
Let’s take a look at what we did.
Compressing and Uncompressing 28 18 18 24 24 8 28
17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar.gz dir3
$ gzip filename In it’s simplest form the only [operand] you have to supply to gzip is the filename you want to compress (zip). In this case we used dir2.tar.
7
Compressing and Uncompressing Notice the file size difference between the unzipped file and the zipped one.
Compressing and Uncompressing Using gunzip to uncompress a file.
before: -rw-rw-r-- 1 userid userid 49664 Ju1
8 16:28 dir2.tar
after: -rw-rw-r-- 1 userid userid 29601 Ju1
8 16:28 dir2.tar.gz
Also notice the .gz extension
Compressing and Uncompressing Here again, the command can be very simple. Always look to the man page for more options. The basic usage of the gunzip command is to reverse the operation of the gzip command.
Uncompressing Practice $ gunzip dir2.tar.gz $ ls –l drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 4 userid userid 512 Jun -rw-rw-r-- 1 userid userid 49664 Ju1 drwxrwxr-x 4 userid userid 512 Jun
28 18 18 24 24 8 28
Uncompressing Practice Type: $ gunzip dir2.tar.gz
Let’s take a look at what we did.
The End… 17:52 21:31 20:10 15:38 13:55 16:28 15:19
Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3
Next … Pipes and Redirects
Now we are back to our original filesize and gunzip removed the extension .gz for us. Simple!
8