The Great Awk

  • Uploaded by: hariji
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View The Great Awk as PDF for free.

More details

  • Words: 1,946
  • Pages: 3
POWERTOOLS

The great awk Samuel Palmer takes a look at the awk programming language, the perfect hacker’s tool for editing text files and analysing your data

wk sounds like the noise made by a dying seagull, although the odd name, as is usual with such things, has relatively prosaic origins, deriving, as it does, from the initials of the surnames of each of the creators of awk, Alfred V Aho, Peter J Weinberger and Brian W Kernighan. Rightly or wrongly, the majority of the credit for awk is generally given to Kernighan, although the name would indicate that this shouldn’t necessarily be so. Awk is, in fact, a powerful but simple programming language that, like grep and sed, has become an essential part of the Unix tool kit, and by extension, the Linux tool kit. Awk was designed to fulfill a common requirement on computer systems, to edit text files, especially where those files are used to store information, and to re-arrange, classify, validate and analyse the data contained in those files. Such work is laborious and subject to error when done manually. The alternative, writing programs in C or other high-level programming languages, is usually impractical and time consuming. Awk could be said to be the godfather of the macro facilities that are provided with modern spreadsheet programs, but is far more powerful, far more versatile, and can be used as a rapid solution in more diverse scenarios.

A

From awk to gawk The first incarnation of the language appeared in 1977, and originated, like Unix and C, under the aegis of Bell Laboratories, which has contributed a disproportionate share of the innovative technologies of the last half century. Kernighan is head of Bell’s Computing Structures Research Department, and is best known as the co-author of The C Programming Language with Dennis Ritchie. There are several varieties of awk. The original specification, as released in 1977, is still referenced because it is the default version on some versions of Unix. A revised version, called nawk, or new awk, was finally released as part of Unix System V Release 3.1 in 1988, although it had already been in internal use within AT&T for several years. Nawk is the most usual implementation of awk on Unix. Nawk added some new features to the language and cleaned up some “dark corners”, as Effective Awk Programming puts it. The preferred adaption of the language for Linux, as implemented by the

Free Software Foundation, is gawk, or GNU awk, which was written in 1986 by Paul Rubin and Jay Fenlason, and was reworked in 1989 by David Trueman and Arnold Robbins. Gawk contains a number of extensions to nawk that increase its functionality and power. The POSIX specifcation of awk includes feedback from both the gawk designers and the original awk designers. Gawk, like so much of the work of the GNU project, is an essential feature of Linux, and is just one of the many tools that give some justification to Richard Stallman’s often disparaged claim that Linux should be known to the world as GNU/Linux. There is a further implementation of awk, mawk, or Mike’s awk implementation, which is also free software, and is available with some Linux distributions. Mawk was written by Mike Brennan, who claims as the main benefit of mawk that it is “the fastest awk implementation I know. It’s even a lot faster than GNU awk (which is much faster than the awks that Unix vendors ship with their systems)”.

LinuxUser/July-August 2001 5 9

POWERTOOLS

On the command line

Get it from the source Effective awk Programming is written by Arnold Robbins, one of the developers of gawk, and the coauthor of Sed & awk. This book is required reading for the Linux programmer who wants to explore the potential of awk and gawk. It is generally considered to give the most in depth coverage of the many titles available on the subject. The book was written under the auspices of the Free Software Foundation and is also available electronically, in which form it can be freely copied and distributed under the terms of the Free Software Foundation’s Free Documentation Licence. A portion of the proceeds from sales of this book will goes to the FSF to support further development of free and open source software. Effective awk Programming is a complete guide to the gawk 3.1 implementation of the language, and also contains the most up-to-date and thorough elucidation of the POSIX standard for awk available anywhere.

It has been said that The Awk Programming Language, by Aho, Kernighan and Weinberger, the originators of the language, “is to AWK what The C Programming Language is to C. Its the bible”. As the original guide to the language it offers some insight into the intentions of the authors, and offers a complete set of examples.

The purpose of awk is to allow complex pattern recognition and relatively complicated arithmetic functions in programs containing one or two lines. An awk program contains a sequence of patterns and actions. Unlike conventional programming languages awk can be said to be data-driven. Awk searches a file, or a set of specified files for a required pattern of data, and then takes the appropriate action (or set of actions), which may be quite complex. Awk is a natural extension of grep and sed, which can be used to perform similar tasks, and was conceived as such by the original designers, as a means of extending the processing capabilities of grep and sed to more complex forms of data. The difference is that awk has a much greater range of pattern recognition tools, can handle arithmetic processes, has the ability to control flow to any part of a program, can store values in user-defined variables that reference general storage locations, and has the ability to operate on user generated internal functions. Awk can perform relatively complex pattern matching, file editing and analysis tasks over multiple files. As such awk replaces the need to use a full programming language, and gives the possibility of rapid facilities for global edits or data analysis. Typically awk might be used for one-off tasks, but an awk script can also be stored in a file, and is one of those classic Unix utilities that has all kinds of unpredictable uses far beyond the original remit of its design - a general purpose programming language that doesn’t need extensive programming experience to achieve the desired results. Awk can be invoked in two forms, which can be conventionally defined as follows:

“You should never use C if you can do it with a script, never use a script if you can do it with awk, never use awk if you can do it with sed, and never awk {options] ‘script’ var=value file(s) use sed if you can do awk [options] -f scriptfile var=value file(s) it with grep” Robert M Slade

sed & awk was written by Dale Dougherty and Arnold Robbins, and is subject to the same laudatory praise as the books above. The book progresses from a simple introduction to the benefits of both sed and awk, towards detailed descriptions of the tools, regular expression syntax and other intricacies. sed & awk is a standard text book Unix programmers and administrators. O’Reilly also publishes a Pocket Reference edition of sed & awk.

The options are -F to define a field seperator to be found in the data, and V to assign a variable that can be used in the script. The script may be written on the command line, or contained in a file. Awk can be used to process multiple files that contain the defined pattern. Patterns can be defined as combinations of regular expressions and comparison operations on strings, numbers, fields, variables, and array elements. Actions may perform arbitrary processing on selected lines. The language is C-like, but has no declarations although strings and numbers have built-in data types. Some benefits of awk include automatic file handling, associative arrays, user-defined and reserved functions, recursion, regular expressions, multidimensional arrays, formatted output using printf and sprintf. Empty patterns and actions can be defined for specific purposes. While typical examples of awk programs show one line applications, awk can in fact be used to compile quite complex operations, and a program that is being used to process data is more likely to be several lines long. Awk has the structures to support this. The simplest awk program might be as follows: awk ‘/LinuxUser/ {print}’ *.txt This program will scan all files in the current directory with the suffix .txt, search for any occurrence of the word LinuxUser, and print to the terminal all lines containing that text.

6 0 LinuxUser/July-August 2001

POWERTOOLS

Short cuts

Awk and nawk and gawk A classic definition of the capabilities and differences between the popular implementations of awk is given by Dale Dougherty and Arnold Robbins in sed & awk. With original awk, you can: • Think of a text file as made up of records and fields in a textual database. • Perform arithmetic and string operations. • Use programming constructs such as loops conditionals. • Produce formatted reports With nawk, you can also: • Define your own functions • Execute Unix commands from a script • Process the results of Unix commands • Process command-line arguments more gracefully • Work more easily with multiple input streams • Flush open output files and pipes (latest Bell Labs awk) In addition, with GNU auk (gawk), you can: • Use regular expressions to separate records, as well as field • Skip to the start of the next file, not just the next record • Perform more powerful string sustitutions • Retrieve and format system time values

From a programmer’s point of view an awk program can be seen as a quick subroutine that can be invoked on its own without the requirement for the surrounding program superstructure. From a user point of view, awk allows the user with a rudimentary knowledge of programming structures to process data according to his or her own requirements. Awk is, in fact, a scripting language that was designed to achieve a limited number of tasks. Some may argue that, as a language, it has been superceded by Perl and other scripting languages, but it is simpler to master and quicker to use. Because awk uses a syntax that looks very much like C, it makes itself attractive as a short cut for programmers to get a task done quickly. As such, awk is often used as a prototyping tool that lends itself to iterative testing of algorithms. Once the proof is working it is a relatively easy process to convert the awk program into another language, or to embed the program in a working script. The authors claim that awk has been used for a diversity of applications “from databases to circuit design, from numerical analysis to graphics, from compilers to system adminstration, from a first language for non-programmers to the implementation language for software engineering courses”. The most typical application remains that for which it was originally designed, to scan and edit text files and to produce reports on the data held therein. If you can not determine when awk should be used in preference to other languages, take the advise that Robert M Slade gave in a review of the O’Reilly book, sed & awk. “The Enlightened Ones say that you should never use C if you can do it with a script, never use a script if you can do it with awk, never use awk if you can do it with sed, and never use sed if you can do it with grep.”

Related Documents

The Great Awk
November 2019 16
Awk
June 2020 5
Awk Introduction
December 2019 35
Shell Script And Awk
November 2019 26
The Great Pyramid
October 2019 22

More Documents from ""

Nagios Nrpe
November 2019 16
The Great Awk
November 2019 16