Python Tutorial: Tarfile Module

  • Uploaded by: Adam Zajac
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Python Tutorial: Tarfile Module as PDF for free.

More details

  • Words: 882
  • Pages: 4
Author(s): Contributor(s): Last Revised:

Adam Zajac 2008.12.12

This work licensed under Creative Commons Attribution-Share Alike 3.0 Unported License.

Working with Tar Files in Python 1. Introduction 1.a Background Reading 2. Tutorial 2.a Adding Files 2.b File Information 2.c Extracting Files 3. Examples 3.a Archiving Select Files from a Directory 4. Extending 4.a Removing Files

1. Introduction "Tar" is an archiving format that has become rather popular in the open source world. In essence, it takes several files and bundles them into one file. Originally, the tar format was made for tape archives, hence the name; today it is often used for distributing source code or for making backups of data. Most Linux distributions have tools in the standard installation for creating and unpacking tar files. Python's standard library comes with a module which makes creating and extracting tar files very simple. Examples of when individuals might want such functionality include programming a custom backup script or a script to create a snapshot of other personal projects.

Background Reading There is significant documentation of both tar files and Python's tarfile module. In addition to this document, the following resources are recommended reading: Wikipedia: tar file Python Library Reference 12.5: tarfile

2. Tutorial This is a basic tutorial designed to teach three things: how to add files to an archive, how to retrieve information on files in the archive, and how to

extract files from the archive.

Adding Files To begin, import the tarfile module. Then, create what is called a "TarFile Object". This is an object with special functions for interacting with the tar file. In this case, we are opening the file "archive.tar.gz". Note that the mode is "w:gz", which opens the file for writing and with gzip compression. As usual, "w" not preserve previous contents of the file. If the tarfile already exists, use "a" to append files to the end of the archive (n.b.: you cannot use append with a compressed archive - there is no such mode as "a:gz"). Create a TarFile Object >>> import tarfile >>> tar = tarfile.open("archive.tar.gz", "w:gz") >>> tar Adding files to the archive is very simple. If you want the file to have a different name in the archive, use the arcname option. Adding a File to the Archive >>> tar.add("file.txt") >>> tar.add("file.txt", arcname="new.txt") Adding directories works in the same way. Note that by default a directory will be added recursively: every file and folder under it will be included. This behavior can be changed by setting recursive to False. Adding a Directory to the Archive >>> tar.add("docs/") >>> tar.add("financial/", recursive=False) As with normal file objects, always be sure to close a TarFile Object. Close the TarFile Object >>> tar.close()

File Information The tarfile module includes the ability to retrieve information about the individual contents of a tar file. Each item is accessed as a "TarInfo Object". For example, getmembers() will return a list of all TarInfo objects in a tar file: Listing TarInfo Objects >>> import tarfile >>> tar = tarfile.open("archive.tar.gz", "r:gz") >>> members = tar.getmembers() >>> members [, ] Each TarInfo object has several methods associated with it. Some examples are below, and a full list can be found here.

TarInfo information >>> members[0].name 'text.txt' >>> members[0].isfile() True

Extracting Files Extracting the contents is a very simple process. To extract the entire tar file, simple use extractall(). This will extract the file to the current working directory. Optionally, a path may be specified to have the tar extract elsewhere. Extracting an entire tar file >>> import tarfile >>> tar = tarfile.open("archive.tar.gz", "r:gz") >>> tar.extractall() >>> tar.extractall("/tmp/") If only specific files need to be extracted, use extract() Extracting a single file from a tar file >>> import tarfile >>> tar = tarfile.open("archive.tar.gz", "r:gz") >>> tar.extract("text.txt") You should be aware that there is at least one security concern to take into account when extracting tar files. Namely, a tar can be designed to overwrite files outside of the current working directory (/etc/passwd, for example). Never extract a tar as the root user if you do not trust it.

3. Examples Archiving Select Files from a Directory archiver.py import os import tarfile whitelist = ['.odt', '.pdf'] contents = os.listdir(os.getcwd()) tar = tarfile.open('backup.tar.gz', 'w:gz') for item in contents: if item[-4:] in whitelist: tar.add(item) tar.close()

4. Extending Removing Files The tarfile module does not contain any function to remove an item from an archive. It is presumed that this is because of the nature of tape drives, which were not designed to move back and forth (consider this post to the Python tutor mailing list). Nevertheless, other programs for creating tar archives do have a delete feature. The following code uses the popular GNU tar programs that comes with most Linux distributions. Their documentation of the "--delete" flag can be read here; note that they warn not to use it on an actual tape drive. The reliance on an external program obviously makes the code far less portable, but it is suitable for personal scripts. Removing an Item from a Tar import subprocess def remove(archive, unwanted): external = subprocess.getoutput("tar --version") if external[:13] != "tar (GNU tar)": raise Exception("err: need GNU tar to delete individual files.") command = 'tar --delete --file="{0}" "{1}"'.format(archive, unwanted) output = subprocess.getstatusoutput(command)[0] return output

Related Documents

Python Tutorial
May 2020 12
Python Tutorial
October 2019 24
Tutorial Python
August 2019 10
Python Tutorial
December 2019 9

More Documents from "Madi"