This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Programming With Perl An Introduction September 2005
Notes:
Structure Of This Course 1. 2. 3.
This course is split into three parts: Introduction Course material Labs/Tutorials/Exercises
(~0.5 hours) (~9.0 hours) (~2.5 hours)
The goal is to cover 75% of the Perl language. All of the material in this course comes from “Programming Perl 3rd edition” and the Perl Cookbook. If there’s anything which is not clear then ask as we go. One thing I would like from this course is feedback, so, Fill in the course feedback form before you leave tomorrow.
An assumption is that everyone has some programming experience. This course isn’t going to teach programming. Some parts of Perl are not going to be covered - Ties and DBM, Formats, Many system functions. This is all reference material which you can find in any of the standard texts - or in the man pages.
Course agenda is the same times for each of the two days. Labs and exercises happen as we go. If there are any problems or questions you wish to raise, just ask.
Day2
Each day is 09:00 to 16:00 with 30 minutes for lunch and a 15 minutes break in both the morning and the afternoon. Agenda is flexible. If there are specific areas which I haven’t covered in which you have an interest, then ask. There are lots of LABS and exercises - most are small to start with and get more detailed as we get to the end of the course. By the time we get to the end of the overview everyone should be capable of writing simple scripts which manipulate files and do simple pattern matching and substitution. We will be largely learning by example - lots of the examples in this course come from the Perl Cookbook.
The Pursuit Of Happiness (Or The Hard Sell) Perl is a language for getting your job done. Designed to make easy jobs easy without making hard jobs impossible What are the easy jobs?
Manipulate numbers & text & files & directories & computers & networks. You want to be able to run external programs & scan their output. It should be easy to develop, modify & debug your own programs.
Perl is a glue language. Perl is especially popular with web programmers and developers. But only because they discovered it first. We will look at perl from a viewpoint of helping in areas of: Design. Programming. Verification. Documentation/Reporting. Data analysis. Perl is an ideal language for data manipulation.
Notes:
What Is Perl? To those who like it: Practical Extraction and Reporting Language. To those who love it: Pathologically Eclectic Rubbish Lister. Above all Perl is: Free. Easy to use. Capable of “One-Liners” or whole projects. But be careful: You can write rubbish software in any language. If you’ve programmed before in Basic, C, C++, Pascal, awk, Python, English then you’ll probably feel comfortable with Perl.
Any language can be used to write code which is not maintainable. Perl isn’t an exception to that rule. We will look at three different styles of programming. 1. 2.
3.
Flat programming - simple scripts. Procedural programming - larger programs based on procedures and control structures and simple data structures. One-liners.
As part of the course notes there is a style guide for Perl. Follow it (or something like it). Some of the things we won’t have time to cover on this course are OO Perl and Advanced Data Structures. If this is something which interests you, then let me know since a follow-up course is possible/likely.
The History Of Perl This guy is Larry Wall, the creator of Perl. Perl has been around since 1987 (Perl1). 1988 sees Perl 2, 1989 sees Perl 3. 1991 sees “Programming Perl, 1st edition, and Perl 4. The Internet explodes into growth. 1994 sees Perl 5. . . . 1997 - Perl 6 is announced. 2005 onwards - Perl 6 - we’re still waiting.
Notes:
More About Perl Perl
is a rich language: Perl is modularly extensible. You can rapidly design, program, debug & deploy applications. You can extend the functionality of those applications as needed. You can embed Perl in other languages. You can embed other languages in Perl. You can write Object-Oriented Perl.
A misconception: Perl is interpreted and so it’s slow!
Perl compiles to an intermediate format (like Java bytecode or Pascal P-Code). Once it is compiled it is passed to the interpreter for execution. Hence: You can write faster code in C but you can write code faster in Perl.
Great solutions come from using pre-built Perl modules written in C: C speed. Perl’s convenience and flexibility.
For embedding the choice of language is C since Perl is written in C.
How To Get Perl Unix: Available on-site. See: /pd/perl/5.005_503/bin/perl /pd/perl/5.8.6/bin/perl /usr/local/bin/perl
Windows: Active-state Perl (version 5.8.6) from www.activestate.com Linux: Included as part of all standard Linux distributions (version 5.8.6) Mac OS X: Included as part of OS X (version 5.8.1 on OS X 10.3.9)
We have various versions on site - recommend that we use 5.8.x. Perl Tk is available on-site in version 5.8.x.
Places To Get Useful Information - I Internet: www.perl.com www.perl.org www.oreilly.com search.cpan.org Comp.lang.perl newsgroup hierarchy: comp.lang.perl.misc comp.lang.perl.moderated comp.lang.perl.modules comp.lang.perl.tk Man perl from a unix command line: Gives all the perl help topics Ask
(The Perl homepage) (The Perl mongers homepage) (Go here to find Perl modules)
All the news groups listed above are available in this building. Perl is probably the most widely used and understood programming language in Bristol. People can always come and ask me a question if they have a problem.
Places To Get Useful Information - II Books: Programming Perl (3rd edition)
Larry Wall & Tom Christiansen & Jon Orwant - ISBN 0-596-00027-8
Learning Perl (3rd edition)
Randal Schwartz and Tom Phoenix - ISBN 0-596-00132-0
Perl Cookbook (2nd edition).
Tom Christiansen & Nathan Torkington - ISBN 1-56592-243-3
Mastering Algorithms With Perl
Jon Orwant, Jarrko Hietaniemi & John Macdonald - ISBN 1-56592-398-7
Advanced Perl Programming
Sriram Srinivasan - ISBN 1-56592-220-4
If you only buy one book make it the camel book (A.K.A. programming Perl) , followed by the Perl Cookbook. If you do buy programming Perl make sure it’s the 3rd edition and NOT the 2nd edition. There are two Perl in 21 Days books, one of which is available on-line at the CR&D bookshelf web-site. (The on-line version can be found in the tutorial areas as a series of PDF files). Since a lot of this course is going to be Perl by example, I’ve placed a few programs into the various tutorial areas which all can be used (reused) as you wish. There’s also a copy of a Perl module (Netlist_Functions.pm) which contains a lot of useful functions which can be imported into your own programs. Hey, why bother programming when you can steal! (This really is the philosophy you should be adopting in your own work.
(Some of) The Perl Manpages
Notes:
Manpage
Covers
perl
What perl manpages are available
perldata
Data types
perlsyn
Syntax
perlop
Operators and precedence
perlre
Regular expressions
perlvar
Predefined variables
perlsub
Subroutines
perlfunc
Built-in functions
perlmod
How to make modules work
perlref
References
perlobj
Objects
perlipc
Inter-process communications
perlrun
How to run Perl commands, plus switches
perldebug
Debugging
perldiag
Diagnostic messages
(More About) The Perl Manpages See also: perlfaq1 to perlfaq9 As of Perl version 5.6.1 you can search individual Perl manpages by using the name of the manpage as a command and passing a Perl regular expression as the search pattern. Examples: perlop comma perlfunc split perlvar ARG perldiag ‘assigned to typeglob’ When you don’t know where something is in the documentation, search all the FAQ’s: perlfaq round
Some Terminology Idiomatic Perl: Widespread and accepted ways of doing certain things in Perl. If ( $variable != 56 ) print “Your variable did not equal $variable\n”; print “Your variable did not equal $variable\n” unless ( $variable == 56 );
Interpolation: Replacing a variable with the variables value. Regexp’s: Regular expressions. CPAN: The Comprehensive Perl Archive Network. The place to go to get modules written and contributed by other Perl programmers. Don’t reinvent the wheel, or if you do then make sure it’s a better wheel. Share code within your office/group/site/business unit.
Idiomatic Perl is one of the most confusing bits of Perl since there are so many different ways of doing things. This can be both useful (you can program in the way which suits you) and a drawback (reading other peoples code isn’t always easy) TMTOWTDI - There’s More Than One Way To Do It - the Perl motto. Interpolation will be mentioned a lot by people who use Perl a lot - it’s just a fancy computer science term. Regexps - these are not exactly the same as regular expressions in other UNIX applications - so be careful. CPAN - pretty light on EDA type code. Maybe we should start a forum!
Account Details There are six user accounts: user1 to user6 Password for each account is: ________ Each area holds: Copies of all the course material as .pdf files. Tutorial areas for all the labs. A “How To” guide. A document on “Perl Style”. A list of some common regexp’s. Issues 1 and 2 of the Perl Review (as .pdf files).
Notes:
Account Details
Notes:
A Standard Header This works in Bristol. #!/usr/local/bin/perl use strict; use warnings; use diagnostics;
Preamble
use Carp; use Cwd; use Config;
Some standard modules
use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" ); use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ OS_SPECIFIC/$Config{archname}" );
Extend lib path
use FindBin qw( $Bin ); use lib $Bin;
Current directory
use Netlist_Tools;
Site specific
There are other binary invocations that use “eval’ with some “magic”.
PREVIEW - Examples Of sprintf() Field
Meaning
%%
A percent sign
%c
A character with the given number
%s
A string
%d
A signed integer, in decimal
%u
An unsigned integer, in decimal
%o
An unsigned integer, in octal
%x
An unsigned integer, in hexadecimal
%e
A floating-point number, in scientific notation
%f
A floating-point number, in fixed decimal notation.
%g
A floating-point number, in %e or %f notation
See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Be careful - sprintf() in Perl does its own formatting - it is NOT calling the underlying sprintf() function in the C library.
PREVIEW - Examples Of sprintf() Field
Meaning
%X
Like %x, but using uppercase characters
%E
Like %e, but using uppercase “E”
%G
Like %g, but using uppercase “E” if applicable
%b
An unsigned integer, in binary
%p
A pointer (the Perl value’s address in hexadecimal)
%n
A special: stores the number of characters output so far into the next variable in the argument list.
In addition to the formats on the previous slide, Perl also supports the following conversions. For compatibility, Perl also supports these conversions: %I - a synonym for %d %D - a synonym for %ld %U - a synonym for %lu %O - a synonym for %lo %F - a synonym for %f
PREVIEW - Examples Of sprintf() Flag
Meaning
space
Prefix positive number with a space
+
Prefix positive number with a plus sign
-
Left-justify within field
0
Use zeroes, not spaces, to right-justify
#
Prefix non-zero octal with “0”, non-zero hex with “0x”
number
Minimum field width
.number
“Precision”: digits after the decimal point for floating-point numbers, maximum length for a string, minimum length for an integer.
l
Interpret integer as a C type long or unsigned long
h
Interpret integer as C type short or unsigned short (if no flags are supplied interpret integer as C type int or unsigned
See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Perl allows the following flags between the % and the conversion character.
PREVIEW - Examples Of chop() And chomp() Remember, chop is indiscriminate, it always removes something, so you’re supposed to know that the last character on a line is “\n”.
while () { chomp; # avoid \n on last field @array = split /:/; ... }
chomp is more discriminating, it will only remove the last character if it’s a “\n”. You could also do s/\n$//; which is explicit.
You almost always want to use chomp() and not chop(). chop() always returns the character it removes. If you chop() a list, then every item in the list is chopped. The thing which ends up in $answer in the question on the slide is the character which was removed from the string $tmp. The thing you probably wanted was $tmp. chomp() is discriminating, and although by default it always removes the last character on a line only if that character is “\n”, the default can be overridden. The character (or string) which is removed is that contained in the Perl variable $/. So chomp() can remove any arbitrary length string from the end of an input string. chomp() returns the number of characters it deleted - not the characters themselves.
PREVIEW - Examples Of hex() And oct() $number = hex("ffff12c0"); sprintf "%lx", $number; # (That's an ell, not a one.)
perl -e 'print 0xffdc;'
sprintf uses the same conventions as C’s sprintf.
A neat command line alternative when you need a quick conversion.
$val = oct $val if $val =~ /^0/;
Does $val start with an “0” (as opposed to “0x” or “0b”).
Note that you can always set the value of any variable with a hex value just by doing this: $h_number = 0xffdd; print $h; The hex() function is interpreting a string as a hex number, not a value. If the string begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as shown. Hex strings can only represent integers. Strings which would cause integer overflow will trigger a warning. oct() will interpret a string as an octal value. If the string starts with “0” it will be interpreted as octal. If the string starts with “0x” it will be interpreted as a hex value. If it begins with “0b” it will be interpreted as a binary value. Try this: perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad enough to know from what 80’s/90’s TV series this was an episode title.
Programming With Perl September 2005
Notes:
Getting Started For many programming tasks you’d like a language in which you can say: print "Hello World!\n"
and expect the language to do just that. Perl is such a language. Some important points … This course is an overview. We’re going to cover a lot of Perl very quickly and there will be lots of examples. There are many slides in this course which have this symbol in the top left corner of the slide. All such slides are gathered together into a single document called “How-to.pdf” in your labs and exercises directory.
This is a minimal (and complete) Perl program, but it illustrates some important points. 1. You don’t have to say much before you say what you want to say. 2. You don’t have to say much after you’ve said what you want to say either. Unlike many languages, Perl thinks it’s okay that you just fall off the end of your program. You may use the exit() function to end a program (actually, you should use the exit() function to end a program) just as you may force yourself to pre-declare variables before you use them (actually …) It’s up to you! Here are a few important points: 1. The \n at the end of the print statement is a newline. 2. All statements are terminated by a semi-colon. LAB1 - HELLO_1
Variables, Arrays & Lists, Hashes
Notes:
Variables And Their Syntax A variable is a handy place to keep something: A place with a name. Might be private of public. Might be temporary or permanent.
This This is is what what computer computer scientists scientists call call scope scope
We’ll learn about scope later (or look up my our local). A variable is distinguished by the sort of data it holds: Singular - one thing - strings and numbers. Plural - many things - lists of strings or lists of numbers (or both). We call a singular variable a scalar. We call a plural variable an array.
These are the two fundamental data types in Perl. One of a thing, and more than one of a thing. We call a singular variable a scalar. We call a variable which contains more than one thing, either an array/list or an associative array/hash.
Variables And Their Syntax We can write a different version of our first example (in the getting started section) like this: $phrase = "Hello, world!\n"; print $phrase;
# Set a variable. # Print the variable.
We didn’t have to predefine what type of variable $phrase was. The $ character tells Perl that phrase is a scalar. Perl has some other variable types with names like hash and handle and typeglob.
Later we’ll see that it is a good idea to force yourself to predefine variables before you use them (using my()). Hash and handle we’ll cover later. Typeglob won’t be covered in this course. LAB1 - HELLO_2 LAB1 - HELLO_3 LAB1 - HELLO_4
Variables And Their Syntax
Type
Character
Example
Is a name for:
Scalar
$
$pounds
An individual value (number or string)
Array
@
@large
A list of values keyed by number
Hash
%
%interest
A group of values keyed by a string
Subroutine
&
&how
A callable chunk of Perl code
Typeglob
*
*struck
Everything named struck
Tips: The $ for scalar is a stylized S. The @ for array is a stylized A. Sadly the analogy breaks down after that. We’ll cover subroutines in detail later in the course. Typeglob won’t be covered in this course.
Quiz: What’s the value in $days after this has run? my @days = qw( Monday Tuesday Wednesday Thursday Friday Saturday Sunday ); my $days = @days;
Review: Scalars store a single variable - all scalars are prefixed by $. Arrays store many variables. Arrays start with @ or %. @ arrays are accessed by index - % arrays (hashes) are accessed by a string. Note that the range operator (..) has made an appearance. So 1 .. 20 will give you all the integers between 1 and 20 inclusive. We’ll talk more about the range operator later. In the quiz example we’ve introduced a lot of new stuff. qw (think of this as quoteword) lets you use Barewords to create lists. This whole example is an illustration of context - the value of $days after the example has run is ?
integer floating point scientific notation underline for legibility octal hexadecimal binary
You can’t use “,” in numbers since in Perl the , is an operator - so we use _ instead. Octal numbers are prefixed with 0 (that’s zero). Hex numbers are prefixed 0x (that’s zero x). Binary numbers are prefixed by 0b (that’s zero b).
Variables Types - Scalars Scalars are assigned a new value with the = operator. Scalar variables can be: Integers. Floating-point numbers. Strings. References to other variables (think C pointers). Objects. Double quote marks “” do variable interpolation and backslash interpolation. Substitution and turning “\n” into a newline. Single quotes ‘’ suppress interpolation. Backticks `` will execute an external program and return the output in a string.
The “=“ symbol does assignment. Be careful because the “==“ symbol is used for equality. At some point in your life you’ll accidentally confuse the two. Double quotes do variable and backslash interpolation - Interpolation is a fancy computer-science name for replacing a variable with the contents of that variable. Single quotes suppress interpolation. Backticks (the ones which lean towards the left) will execute an external program and return its output to you in the form of a string.
an integer a "real" number scientific notation string string with interpolation string without interpolation another variable's value a gastrochemical expression numeric status of a command string output from a command
Scalars can also hold references to data structures, subroutines and objects. $ary = \@myarray; $hsh = \%myhash; $sub = \&mysub;
# reference to a named array # reference to a named hash # reference to a named subroutine
$ary = [1,2,3,4,5]; # reference to an unnamed array $hsh = {Na => 19, Cl => 35}; # reference to an unnamed hash $sub = sub { print $state }; # reference to an unnamed subroutine $fido = new Camel "Amelia";
# ref to an object
Variable interpolation: $pet = “Camel”; $sign = “I love my $pet”; print $sign; What do you think this will print out?
References will be covered extensively when we get to the in-depth look at Perl. References are the key to writing efficient Perl code with subroutines, and the only way to do OO programming. In the example: $hsh = {Na => 19 , Cl => 35}; the => is the same as a comma “,” - this is convenience which lets us see easily where the keys and where the values are. (Often known as syntactic sugar).
Variables Types - Scalars If you use a variable which has never been assigned a value then: The uninitialized variable springs into existence. Is created with the null value - either 0 or “”. Depending on how you use them variables will be interpreted as: Strings. Numbers. True or False, i.e. boolean. Context - suppose you said this: $camels = '123'; print $camels + 1, "\n";
Question: What do you think is printed in the example shown?
Answer: $camels is a string containing the text ‘123’. When Perl tries to add 1 to a string it first converts the string containing the text ‘123’ into the number 123. It then adds 1 and (hopefully) gets 124. This is then converted back into a string containing the text ‘124’ which is then printed. A newline is then printed. LAB2 - VARIABLES1 LAB2 - VARIABLES2 LAB2 - VARIABLES3 LAB2 - VARIABLES4_A LAB2 - VARIABLES4_B PRINTF and SPRINTF and CHOP and CHOMP LAB2 - VARIABLES5_A, _B, _C
Variables Types - Arrays And Hashes Some kinds of variables hold multiple values: Arrays. Hashes. Like scalars, arrays and hashes spring into existence with nothing in them. When you assign to them they supply a list context. (We’ll look at this later) Arrays and Hashes differ from each other: Use an array to look up something by number. Arrays are always denoted with the “@” symbol - but it’s the whole array. Use a hash to look up something by name. Hashes are always denoted with the “%” symbol - but it’s the whole hash. What’s the difference between a list and an array?
Arrays are also called lists - the distinction is blurred - when an array is used with subscripts it’s generally regarded as an array, when it’s used as an ordered list and used with push() pop() shift() and unshift() it’s generally regarded as a list. It also depends upon context as well as how you think about a particular problem. TMTOWTDI.
Variables Types - Arrays An array is an ordered list accessed by a scalars position in the list. @home = ("couch", "chair", "table", "stove"); ($potato, $lift, $tennis, $pipe) = @home; ($alpha,$omega) = ($omega,$alpha); $home[0] $home[1] $home[2] $home[3]
= = = =
"couch"; "chair"; "table"; "stove";
An array is an ordered list accessed by a scalars position in the list. The list can contain numbers, strings, or a mixture of both. It can also contain references to variables and references to objects or references to other arrays or references to other hashes. To assign a list value to an array you simply group the values together with “(“ and “)”. If you use @home in a list context (on the right side of a list assignment) you’ll get the list back. So you could set 4 scalar variables as shown. List assignments happen in parallel so you can swap two scalar variables as shown in the third example. Arrays are 0 based (as in C) so while the list contains 4 elements the elements are numbered 0 to 3. Array subscripts are enclosed in “[“ and “]” so an individual element is referred to as $home[n]. Since the element is a scalar (a single thing) it is preceded by $.
Review: an array variable is able to store a series of values with each uniquely identified by an integer known as its index. The contents of an array are accessed collectively by giving the array name prefixed by an @. @dwarfs = (“Happy” , “Sleepy” , “Grumpy” , “Dopey” , “Sneezy” , “Bashful” , “Doc”); @deadly_sins = (“Gluttony” , “Sloth” , “Anger” , “Envy” , “Lust” , “Greed” , “Pride”); print “@dwarfs never commit @deadly_sins\n”; In the examples shown: 1: The array contains three items. 2: What does $stuff contain ? 3: What does $stuff contain ? 4: What does @x contain ? 5: What does that last “,” do ? 6: But look, we can do away with “,” entirely as long as the list items do not contain white-space.
List Assignment Examples: 1: my ($a, $b, $c) = (1, 2, 3); 2: my ($map{red}, $map{green}, $map{blue}) = (0xff0000, 0x00ff00, 0x0000ff); 3: my ($dev, $ino, undef, undef, $uid, $gid) = stat($file); 4: my ($a, $b, @rest) = split; my ($a, $b, %rest) = @arg_list; 5: while (($login, $password) = getpwent) { if (crypt($login, $password) eq $password) { print "$login has an insecure password!\n"; } }
@days + 0; scalar(@days)
# implicitly force @days into a scalar context # explicitly force @days into a scalar context
1: Parallel assignment of three scalars. 2: Parallel assignment of three scalars - which are values in a hash. 3: If you don’t want some of the things returned in a list, throw them away by undef’ing them. 4: Here we take $a and $b from the list and then the rest of the list goes into @rest. Here’s an important principle - the first list in the list (so to speak) gets everything else in the list! In the next example $a and $b get the first two values from @arg_list and then the hash %rest gets everything else. There’s an issue here concerning how many items are left in the list before it’s assigned to the hash %rest the length of the list needs to be a multiple of 2. The last two examples show how you can force things into scalar context - the scalar() function is one way.
List And Array Examples Examples: # Stat returns list value. $modification_time = (stat($file))[9]; # SYNTAX ERROR HERE. $modification_time = stat($file)[9];
# OOPS, FORGOT ()
# Find a hex digit. $hexdigit = ('a','b','c','d','e','f')[$digit-10]; # Get multiple values as a slice. ($day, $month, $year) = (localtime)[3,4,5];
Note: lists grow dynamically, so you can have a 4 element list like this: my @list = qw( fred barney wilma betty ); and say this: $list[656] = "dino"; And Perl will create all the intervening array slots for you (they will all have the value undef). If you create a big array and you’d later like to delete it (to save on memory perhaps) then you can do this: my @big_array = (); # create the array @big_array = <SOME_FILE>; @big_array = undef;
# load a ton of stuff into it" # delete the array
If you want to remove all the entries in an array without undef’ing it and then recreating it, then just do this: @my_array = (); The same works for hashes as well - to empty a hash just do this: %my_hash = ();
Variables Types - Arrays Since arrays are ordered you can do useful operations on them such as; Stack operations: push() pop() shift() unshift()
Perl regards an array as an ordered list. The end of the array (i.e. the right-hand part of the list) is considered the top of the stack. push() and pop() work on the top of the stack. shift() and unshift() work on the other end of the stack. shift() takes one element from the start of a list, unshift puts a new element at the start of the list. What do you think is printed on the last line of the example?
What does the list @home contain once the example has been run?
How Do I … Specify A List In A Program? You want to include a list in your program. @a = ("quick", "brown", "fox");
A comma separated list
@a = qw( Why are you bugging me? );
Use qw() if you have a lot of Single-word elements Use something like this if you want to read a list from a file
@bigarray = (); open(DATA, "< mydatafile") or die "Couldn't read from datafile: $!\n"; while () { chomp; push(@bigarray, $_); } $banner = 'The Mines of Moria'; $banner = q(The Mines of Moria); $name = "Gandalf"; $banner = "Speak, $name, and enter!"; $banner = qq(Speak, $name, and welcome!);
More info: See The Perl Cookbook, section 4.1 Page 91.
Use the quoting operators. These two lines are equivalent. q() is the same as single quotes Use the quoting operators. These lines are equivalent. qq() is the same as double quotes
How Do I … Specify A List In A Program? You want to include a list in your program.
@banner = ('Costs', 'only', '$4.95'); @banner = qw(Costs only $4.95); @banner = split(' ', 'Costs only $4.95');
These 3 are identical
@banner = qw|The vertical bar (\|) looks and behaves like a pipe.|;
Different quoting character
More info: See The Perl Cookbook, section 4.1 Page 91. qx() and backticks are not exactly the same. Backticks do not stop variable interpolation while qx() does. If you don’t want Perl variables to be expanded then you can use a single-quote delimiter on qx() to stop this. q(), qq() and qx() quote single strings. qw() quotes a list of single word strings by splitting its argument on whitespace without variable interpolation. If you don’t want to change the quoting character, use a backslash to escape the delimiter in the string.
How Do I … Change The Size Of an Array? You want to enlarge or truncate an array. # grow or shrink @ARRAY $#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER $ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;
$#ARRAY is the number of the last element in @ARRAY If you assign it a number smaller than its current value then the array is truncated. Truncated elements are lost. If you assign it a number bigger than its current value then the array grows. All new elements have the value undef. $#ARRAY is not equal to @ARRAY (or scalar( @ARRAY) ).
More info: See The Perl Cookbook, section 4.3 Page 95.
Solution: Assign to $#ARRAY
How Do I … Swap Values Without Using A Temporary Variable? You want to exchange the values of two variables, but don’t want to use a temporary variable. ($VAR1, $VAR2) = ($VAR2, $VAR1);
Solution
$temp $a $b
Normally you would do something like this (say in C)
= $a; = $b; = $temp;
($alpha, $beta, $production) = qw(January March August); # move beta to alpha, # move production to beta, # move alpha to production ($alpha, $beta, $production) = ($beta, $production, $alpha);
You can swap more than two things at a time
More info: See The Perl Cookbook, section 1.3 Page 8. Most programming languages require you to use a temporary variable when swapping two variables values. Perl however will track both sides of the assignment and guarantees that you won’t accidentally clobber any of your values. This lets you eliminate the temporary variable. You can also exchange more than two variables at once.
How Do I … Append One Array To Another? You want to join two arrays together by adding all the items of one to the end of the other. push(@ARRAY1, @ARRAY2);
This is output: Time Flies Like An Arrow Fruit Flies Like A Banana
More info: See The Perl Cookbook, section 4.9 Page 108. Push() is optimised for appending a one array to another. If you use list flattening beware that this takes more memory and is slower. If you want to insert elements of one array into the middle of another, use splice(). The splice() function: We’ve already seen push, pop, shift and unshift. They are all examples of a generic function called splice(). The splice function takes four arguments: an array to be modified, the index at which it is to be modified, the number of elements to be removed (starting at the index specified in the previous argument), and a list of extra elements to be inserted at the index (after the previous elements are removed). The function returns a list of the elements which are removed.
List Flattening Contrary to what you might expect: @virtues = ( “Faith” , “Hope” , ( “Love” , “Charity ) );
This doesn’t produce a hierarchical list of three elements where the third element is itself a two-element list. Each element of a list must be a scalar, not another list. Above example is actually the same as: @virtues = ( “Faith” , “Hope” , “Love” , “Charity );
It is easy to make a hierarchical list in Perl - see references.
$single = q!I said, "You said, 'She said it.'"!; $double = qq(Can't we get some "good" $variable?);
Some of these forms are syntactic sugar which allow you to not put lots of formatting in strings (which might be confusing and lead to mistakes). In the first example we’ve used ! As the quote mark, which means we can freely use “ and ‘ in the text string we wish to build. We could have used our normal quotes and escaped the “ and ‘ quotes inside the string, but it would have been very hard to read. Any character in a string which might be otherwise interpreted as a controlling character, can always be included in a string by escaping it - i.e. if we want to put a “ in a double-quoted string, we can always do this by writing the “ inside the string as \”. \ followed by {any character} is the same as {any character}.
Variables Types - Hashes
Hashes are arrays accessed by a string. Hashes are also called associative lists. push() and pop() and shift() and unshift() have no meaning for hashes. A hash has no beginning and no end. @home
%longday
1
2
3
4
Couch
Chair
Table
Stove
Sat Saturday Tue Tuesday Mon Monday Sun
The % character is used to mark hash names.
Thu Thursday Fri Friday Wed Wednesday
Sunday
Hash keys are not automatically implied by their position. In fact the concept of position has no meaning for a hash. (And as we will see later, this means that you can’t use foreach on a hash to loop over all the things in the hash). You must supply a key as well as a value when populating a hash. You can assign a list to a hash (just like an array) but pairs of items from the list will be interpreted as key/value pairs in the hash. So you can say this: @list = ( “Sat” , “Saturday” , “Sun” , “Sunday” , etc , “Fri” , “Friday” ); %hash = @list;
This is the same as: %hash = ( “Sat” => “Saturday” , “Sun” => “Sunday” , etc , “Fri” => “Friday” );
Variables Types - Hashes %longday could be declared like this: %longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday", "Wed", "Wednesday", "Thu", "Thursday", "Fri", "Friday", "Sat", "Saturday");
This is hard to read, so Perl provides => as an alternative to the comma. %longday = ( "Sun" => "Mon" => "Tue" => "Wed" => "Thu" => "Fri" => "Sat" => );
As in the example from the previous slide - suppose you wanted to translate abbreviated days names to their corresponding full names. You could write the list assignment as shown in the top box. This is visually noisy, so Perl provides the => (comma operator) so that with a bit of creative formatting the same statement can be written as shown in the second example. Remember - Hashes have no order to them - all accessing is done via the keys. Do not try to use foreach to loop over the values in a hash.
Variables Types - Hashes Hashes are still an array full of scalars. Select an individual hash element using { and }. Example - the value associated with “Wed” in our example is: $longday{ “Wed” }; Note we’re dealing with a scalar value so there’s a $ on the front, not a %. Example: Suppose we have a hash called %wife. The name of the hash
Since this is a hash We need { and }
$wife{”Tony"} = ”Cherie";
A scalar so $
The key
The value
You can assign a list to a hash - see our previous examples - each pair of items in the list is taken as (respectively) a key and a value. You can assign a hash to a list. If you do then it’ll convert the hash into a list of key/value pairs. Often we use: The keys() function to extract a list of just the keys. This list will also be unordered (the respective keys won’t be in a list in the same order that they were entered into the original hash) but can be sorted using sort(). Remember a single element of a hash is still a scalar - so it is always prefixed by a $ and not a %. The % refers to the whole hash and not to individual elements. You also need to use “{“ and “}”. It is generally true that things don’t come back out of a hash in the same order that they go in (if say, you get all the keys back out with the keys() function). Do not try to use push(), pop(), shift() or unshift() with hashes. They don’t work remember, position in a hash has no meaning.
Functions Which Work With Hashes A limited set of functions work with Perl hashes: keys List all the keys in a hash values List all the values in a hash each Used to iterate key/value pairs exists Tells you whether a hash key exists delete Deletes a hash key/value pair. my @keys = keys( %my_hash ); my @values = values( %my_hash );
while ( my ( $key , $value ) = each %my_hash ) { print $key . " " . $value . "\n"; }
In the same way that an array can be deleted by assigning it with undef, so can a hash. So to delete a hash, do this: %my_hash = undef; If however you just want to remove all the entries in the hash without undef’ing it and then recreating it, then just do this: %my_hash = (); i.e. assign the empty list to the hash.
How Do I … Create A Hash? You want to create and populate a hash with key/value pairs. %age = ( "Nat", 24, "Jules", 25, "Josh", 17 );
Solution: A hash can initialised with a list where each pair if values in the list being interpreted as a key/value pair.
You can also use the comma operator => to initialise a has like this.
The => operator automatically quotes anything on its left, so you can omit the quotes on the keys
More info: See The Perl Cookbook, section 5.0 Page 129. Solution: assign a list of pairs of items to the hash. You can also use the => operator to do the same thing - it visually easier to see what is happening and where the key/value pairs are located in the list. Using => will automatically quote what’s on its left. Single-word hash keys are also automatically quoted, so you can write $hash{“somekey”} as $hash{somekey}. Hashes are stored in an order which is convenient for the implementation of hashes, which means that the extraction order is not the same as the insertion order.
How Do I … Add An Element To A Hash? You need to add an element to a hash. $HASH{$KEY} = $VALUE;
Solution: Simply add a new entry like this
More info: See The Perl Cookbook, section 5.1 Page 130. Solving this problem is easy - just add any new entry as shown. Perl will take care of all memory management for you, and just as with arrays and lists, you don’t need to worry about overflow. If you use undef as a hash key it will be turned into the empty string “”. If you try to get a value for a key which isn’t in the hash you’ll also get undef, so you can’t simple use if $hash{key} to see if a key exists. You need to use exists($hash{key}) to test whether the key is in the hash, defined($hash{key}) to see if it is or is not undef, and if($hash{key}) to test it for true or false.
Hashes Remember - a hash is just an array where things are looked up by name. If you assign a list to a hash - pairs of items become key/value associations. %map = ('red',0xff0000,'green',0x00ff00,'blue',0x0000ff); %map = (); # clear the hash first $map{red} = 0xff0000; $map{green} = 0x00ff00; $map{blue} = 0x0000ff; %map = ( red => 0xff0000, green => 0x00ff00, blue => 0x0000ff, ); $field = radio_group( NAME VALUES DEFAULT LINEBREAK LABELS );
The => operator has the nice side effect of quoting anything on its left, so we can leave the quotes off red, green, blue in the third example. The value on the right of => will still need quotes if it is a character string. The last example uses named parameters to invoke complex functions. The hash when it’s initialized, is done in some order. The values generally don’t come back out in the order they went in. You can’t use scalar( %hash ) (or even use %hash in scalar context) to find out how many things are in the hash. If you want to know that, use: scalar( keys( %hash ) ); or scalar( values( %hash ) ); LAB4 - HASH_1 LAB4 - HASH_2 LAB4 - HASH-3 LAB4 - HASH_4 LAB4 - HASH_5
Array And Hash Slices Slicing an array: print $tragedy[3] , $tragedy[4] , $tragedy[5]; print @tragedy[3,4,5] These are equivalent
Slicing an array: The things in the array slice are not copies - they are the same elements. So assigning to the array slice is also assigning to the original array elements. (The same is also true for a hash slice). The slice is a list (hence the @) and the brackets are [ and ]. Slicing a hash: The values() function returns hash values in an apparently random order, so to create a list of values from a hash with a specific order we often have to do something similar to what is shown in the example. Instead of putting a single key in the curley braces, we put a list of keys in the curley braces. The slice is a list (hence the @ and NOT a $ or a %) and the brackets are { and }.
Scalar And List Context Examples: $x $x[1] $x{"ray"}
Funkshun() should always figure out what it is supposed to return.
scalar context list context list context list context
The first three examples are all evaluated in scalar context. The second set of examples are all evaluated in list context - even if the assignment only picks out a single value from such a list. The rules don’t change when using my to force ourselves to declare variables. A well designed function can figure out what context it’s been called in (using wantarray) and return what is appropriate. The wantarray function is used like this: If wantarray { return @an_array; } else { return $a_scalar; }
Variables Types - Simple Data Structures Arrays and Hashes are simple, flat data structures. How do we build more complex data structures? Here’s the wrong way and the right way to do it: $wife{"Jacob"} = ("Leah", "Rachel", "Bilhah", "Zilpah"); # WRONG $wife{"Jacob"} = ["Leah", "Rachel", "Bilhah", "Zilpah"]; # RIGHT
Once this is done you can refer to individual elements like this: $wife{"Jacob"}[0] $wife{"Jacob"}[1] $wife{"Jacob"}[2] $wife{"Jacob"}[3]
= = = =
"Leah"; "Rachel"; "Bilhah"; "Zilpah";
Sometimes you need to build not-so-lovely and not-so-simple data structures. Perl lets you do this by pretending that complicated values are really simple ones. We want $wife{“Jacob”} to refer to a single thing (it’s a scalar) so it must refer to a Perl reference, and a reference to a list is created using [ and ] and not ( and ). We are telling Perl to pretend that a whole list is in fact a scalar. The statement creates an anonymous array (i.e. and array without a name) and puts a reference to it into the hash element $wife{“Jacob”}. This is how Perl deals with both multi-dimensional arrays and nested data structures. You can see in the second example how this looks like a multi-dimensional array with one string subscript and one numeric subscript. We’ll discuss this is more detail tomorrow … This example (and the one on the following page) are here to demonstrate that making complex data structures is easy.
Suppose we not only wanted to know the names of Jacob’s wives, but also the names of all sons of all his wives. In this case we want to treat a hash as a scalar - we use { and } for that. Now we have an array in a hash in a hash. Adding another level to a nested data structure is like adding another dimension to a multi-dimensional array. The important point is that Perl lets you pretend that something which is complex is a simple scalar. Perl’s whole object oriented structure is built upon this kind of encapsulation. Again, we’ll discus this in detail tomorrow.
Variable Types - Packages Why use packages? Use other peoples code. Let’s us split up our own code into manageable units. Is the basis for the whole of Perl’s OO system. Ensures that our code (subroutine & variable names) do not clash with imported code. # This file is Matrix.pm
# This file is Solve.pm # This is our code
... use Matrix; sub print_me { # Code to print out a matrix }
... sub print_me { # Code to print out an equation }
Packages are a way of splitting up your code. They are roughly equivalent to C/Spice/Verilog .include statements. Suppose we pick up Matrix.pm from somewhere - it has a subroutine called print_me. We import Matrix.pm and We also have a subroutine called print_me which does something completely different. When we want to call print_me, which subroutine do we call?
Variables Types - Packages Suppose you want to talk about matrices. You would start off by saying this in Matrix.pm: package Matrix;
The effect of this is that from this point onwards any global name in Matrix.pm will be prefixed by Matrix:: So if you say: package Matrix; $result = &print_me();
Then the real name of $result is $Matrix::result and the real name of &print=me() is &Matrix::print_me()
In computer-science, and in Perl, each of these packages establishes a “namespace”. You can have as many namespaces as you want but you’re only ever in one at a time. If we don’t use a package declaration in our program then the default name is “Main::” This means that the previous example will work since print_me() in Matrix.pm is really &Matrix::print_me() while print_me in solve.pm is really &Main::print_me(). {We would be better off in Solve.pm using a declaration like package Solve; - what would the &print_me() subroutine be called then?} Code which is brought into a program like this with a use command, is also called a module. The standard is to name the module with the same name as the package it contains (but with an initial uppercase letter) and with a .pm filename suffix. Thus the code for package Matrix; would be contained in a file called Matrix.pm The nice thing about Perl is that there are a *lot* of packages “out there” that you can use to solve all sorts of problems.
Variables Types - Pragma’s In the previous section we used the “use” command to load in some new code (a module). Some of the built-in modules in Perl don’t add code. Rather they change the way that the language behaves. These special modules are called pragmas. Example: use strict;
Pragma’s change the way the language works. In the example shown, it tightens up on some of the rules which Perl uses by default and requires the programmer to be explicit. This example would require that you predefine all your variable names - this is usually a good thing - see the section on style in about five minutes time.
How Do I … Round Floating-Point Numbers? You want to round a floating-point number to a certain number of decimal places. $rounded = sprintf("%FORMATf", $unrounded);
More info: See The Perl Cookbook, section 2.4 Page 46. The “f” argument in sprintf will let you specify how many decimal places the argument should be rounded to. Perl looks at the next digit in the number, rounds it up if it is 5 or greater, or down otherwise.
How Do I … Compare Floating-Point Numbers? You want to compare floating-point numbers to know if they’re equal to a certain level of significance. # equal(NUM1, NUM2, ACCURACY) : returns true if NUM1 and NUM2 are # equal to ACCURACY number of decimal places sub equal { my ($A, $B, $dp) = @_; return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B); }
More info: See The Perl Cookbook, section 2.2 Page 45. Floating-point arithmetic isn’t precise so you should never do a direct comparison using “==“. The solution is to turn the floating-point numbers into strings using sprintf and then compare those strings. Alternatively use a large multiplier on both numbers (like 1000000), turn that result into an integer and then use “==“, but this demands that you have some idea of the magnitude of the numbers before you start. If the number of decimal places is fixed this make this latter solution easier.
How Do I … Convert Binary And Decimal Numbers? You have an integer whose binary representation you would like to print out, or a binary number which you would like to print as an integer. sub dec2bin { my $str = unpack("B32", pack("N", shift)); $str =~ s/^0+(?=\d)//; # otherwise you'll get leading zeros return $str; } sub bin2dec { return unpack("N", pack("B32", substr("0" x 32 . shift, -32))); } $num = bin2dec('0110110'); $binstr = dec2bin(54);
# $num is 54 # $binstr is 110110
More info: See The Perl Cookbook, section 2.3 Page 48. You can’t solve either problem with sprintf since it doesn’t have a “print in binary” format. So we use pack and unpack for manipulating strings of data. Both the pack and unpack functions take arguments which specify what they should do with their arguments.
How Do I … Control Case? A string in uppercase needs converting to lowercase, or vice-versa. use locale;
Obey the language environment Use functions Use string escapes Use string escapes
# You can do case insensitive string comparisons like this: if (uc($a) eq uc($b)) { print "a and b are the same\n"; }
More info: See The Perl Cookbook, section 1.9 Page 19. The two ways of doing the conversions (functions and string escapes) look different, but do the same thing. You can set the case of either the first character or the whole word. The use locale directive tells the Perl case conversion functions and pattern matching engine to respect your language environment, allowing for languages with umlauts, accent marks, cedillas and other diacritics used in many languages. You can also use the case conversion functions and pattern matching to do case insensitive string comparisons.
How Do I … Find Out Today’s Date? You need to find out the year, month and day values for today’s date. ($day, $month, $year) = (localtime)[3,4,5]; printf("The current date is %04d %02d %02d\n", $year+1900, $month+1, $day); # prints - The current date is 2005 08 08 # Could also have been written - ($day, $month, $year) = (localtime)[3..5]; use Time::localtime; $tm = localtime; ($DAY, $MONTH, $YEAR) = ($tm->mday, $tm->mon, $tm->year);
This is an object-oriented version of localtime().
More info: See The Perl Cookbook, section 3.1 Page 73. Solution - use localtime() and extract the information you want from the list it returns. Or, use Time::localtime which overrides locatime() to return a Time:tm object. You can then use the inbuilt method calls of the Time::localtime object to get the values you want.
Style, File Handles & Operators
Notes:
Running Perl Programs And Scripts If you’re doing something simple - this will work: % perl -e ‘print "Hello World!\n";’
For longer scripts put the code into a file and say this: % perl grading
The most convenient way is to make the file executable and ensure this line is at the top of the file: #!/usr/local/bin/perl -w % grading
% at the start of the following lines is the Unix shell prompt. % perl -e : You’re basically trying to cram everything onto one line. % perl grading : Feed the program explicitly to Perl. % grading : Let the shell call Perl to run the script. Useful tip - never just use this at the top of your file to invoke Perl: #!/usr/local/bin/perl But rather use this instead: #!/usr/local/bin/perl -w This will turn on lots of warning messages.
Good Programming Practice #!/usr/local/bin/perl -w use lib "/a/unix/path/to/my/Perl/Modules"; # Pull in some modules use strict; use Netlist_Functions; # Define a constant use constant PI => 3.141562953589793; # Create some variables my @args = (); my $flag = TRUE; # ALL YOUR PROGRAM CODE GOES HERE exit 0; # Put all your subroutines here
A more extensive version of this template can be found in the tutorial area and in your notes. Note: Once you “use strict;” all your variable will have to be defined like this: my $variable; Or my $variable = 56; You’ll get compile time errors if you don’t use my. Perl will also tell you about variables you define and never use. For any programs other than one-liners, ALWAYS use a methodology like this - it will save you lots of time in debugging applications. We’ll talk more about strict later.
Style Guidelines See the separate document provided with the course notes. Here’s a brief summary: Enable warnings with “#!/usr/local/bin/perl -w” or use warnings; Use “use strict;” Use “==” for numeric tests and eq for string tests. Don’t confuse “==” and “=”. Don’t confuse “=” and “=~”. Use a consistent indent when writing code. Use consistent bracket matching. Never, ever use “goto”. Don’t use printf when print will do - which is nearly always. Use comments - lots of comments. Document your code.
Note that there’s a complete style guide included in the course notes. There’s also a separate style presentation later in the course.
Filehandles A filehandle is a name given to a file, device, socket or pipe. Filehandles hide the complexity of buffering from your program. They also provide a symbolic name. You create a filehandle using the open() function. Open() needs two parameters: The filehandle. A filename. STDIN, STDOUT and STDERR are predefined for you. You also need to specify the behavior of the open() function.
Notes:
Filehandles Using open() open(SESAME, open(SESAME, open(SESAME, open(SESAME, open(SESAME, open(SESAME,
print STDOUT "Enter a number: "; $number = <STDIN>; print STDOUT "The number is $number.\n";
read from existing file (same thing, explicitly) create file and write to it append to existing file set up an output filter set up an input filter
# ask for a number # input the number # print the number
chop($number = <STDIN>);
# input number and remove newline
$number = <STDIN>; chop($number);
# input number # remove newline
You can use open to create filehandles for a variety of purposes (input, output, piping). Once opened the filehandle can be used to access the file or device until it is closed with … Using open with the same filehandle again will close the first filehandle. Once a file is open it can be read from using the line reading operator <>. An empty <> will read from STDIN. What is STDOUT doing with the print statement in the second example? Since it’s the default - you don’t need it. The last two examples do the same thing - you’ll most frequestly see the first - this is one of Perl’s common idioms. Note that when you do use a filehandle with a print statement, there’s no “,” between the print, the filehandle and the text.
How Do I … Process All The Files In A Directory You want to do something to each file in a particular directory. opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { # do something with "$dirname/$file" } closedir(DIR);
Solution: Use opendir to open the directory and readdir to retrieve all the filenames
$dir = "/usr/local/bin"; print "Text files in $dir are:\n"; opendir(BIN, $dir) or die "Can't open $dir: $!"; while( defined ($file = readdir BIN) ) { print "$file\n" if -T "$dir/$file"; } closedir(BIN);
Example: Read all the files and add on the directory path at the front of the filenames
More info: See The Perl Cookbook, section 9.5 Page 318. The opendir, readdir and closedir functions operate on directories the same way that open, close and <> operate on files. Both use handles, but the handles used by the directory functions are different from those used by files. In scalar context readdir returns the next filename from a directory until it runs out of names, at which point it returns undef. In list context it returns the rest of the filenames in a directory or an empty list if there are no filenames left.
Operators - Arithmetic
Example
Name
Result
$a + $b
Addition
Sum of $a and $b
$a * $b
Multiplication
Product of $a and $b
$a % $b
Modulus
Remainder of $a divided by $b
$a ** $b
Exponentiation
$a to the power $b
You can work out subtraction and division for yourself. You can always use ( and ) to force the order of evaulation you want.
Operators - String There is an addition operator for strings that performs concatenation. Perl uses . $a = 123; $b = 456; print $a + $b; print $a . $b;
# prints 579 # prints 123456
There’s also a “multiply” operator for strings, called the repeat operator. $a = 123; $b = 3; print $a * $b; print $a x $b;
# prints 369 # prints 123123123
Note in the above how Perl is converting from numbers to strings as needed. String concatenation is also implied in interpolation which occurs in double-quoted strings.
Operators - String The following three statements all print the same thing. print $a . ' is equal to ' . $b . ".\n"; print $a, ' is equal to ', $b, ".\n"; print "$a is equal to $b.\n";
# dot operator # list # interpolation
Of the three different ways of printing shown above, interpolation is the easiest to understand.
# Append newline to $line. # Make string $fill into 80 repeats of itself. # Set $val to 2 if it isn't already "true".
$a = $b = $c = 0; # C programmers will be familiar with this ($temp -= 32) *= 5/9; chop($number = <STDIN>);
First three assignments are hopefully obvious Second and third examples are op= syntax and works for all of Perl’s binary operators.
Operators - Unary Arithmetic Can use something like $variable += 1 as shorthand. Perl also has autoincrement and autodecrement operators. Example
Name
Result
++$a, $a++
Autoincrement
Add 1 to $a
--$a, $a--
Autodecrement
Subtract 1 from $b
If you place the operator in front of the variable it is known as pre-increment or predecrement. The value is changed before it is used. If you place the operator after the variable it is known as post-increment or postdecrement. The value is changed after it is used.
If you’ve used C before this is exactly the same is pre/post increment/decrement in that language. $count = 3; $limit = $count++; print “Count=$count and Limit=$limit\n”; Count=4 and Limit=3 or $count = 3; $limit = ++$count; print “Count=$count and Limit=$limit\n”; Count=4 and Limit=4
# $a is assigned 5 # $b is assigned the incremented value of $a, 6 # $c is assigned 6, then $a is decremented to 5
Operators - Logical Also known as short-circuit operators. Allow the program to make decisions without using lots of “if” statements. Example
Name
Result
$a && $b
And
$a if $a is false, $b otherwise
$a || $b
Or
$a if $a is true, $b otherwise
! $a
Not
True of $a is not true
$a and $b
And
$a if $a is false, $b otherwise
$a or $b
Or
$a if $a is true, $b otherwise
not $a
Not
True of $a is not true
$a xor $b
Xor
True if $a or $b is true, but not both
open(GRADES, "grades") or die "Can't open file grades: $!\n";
Called short-circuit operators because they skip the evaluation of rightward arguments once they have enough information to decide an overall result. The bottom example is from our grading program. Perl tries to open the file called “grades”. If it succeeds then the program continues with statements which follow this line, otherwise Perl issues an error message via the die() function and stops. Note that this code is visually easy on the eye and the important thing which the line it trying to do is the first thing on the line - secondary actions are off to the right of the code.
Operators - Numeric And String Comparison There are two sets of operators - one for numbers and one for strings.
Comparison
Numeric
String
Return Value
Equal
==
eq
True if $a is equal to $b
Not equal
!=
ne
True is $a is not equal to $b
Less than
<
lt
True if $a is less then $b
Greater than
>
gt
True if $a is greater than $b
Less than or equal
<=
le
True if $a is not greater than $b
Greater than or equal
>=
ge
True if $a is not less than $b
Comparison
<=>
cmp
0 if equal, 1 if $a greater, -1 if $b greater
Notes:
Operators - File Test File test operators let you find out information about files before you blindly muck about with them. Here are a few of the file test operators. Example
Name
Result
-e $a
Exists
True if the file named in $a exists
-r $a
Readable
True if the file named in $a is readable
-w $a
Writable
True if the file named in $a is writable
-d $a
Directory
True if the file named in $a is a directory
-f $a
File
True if the file named in $a is a regular file
-T $a
Text file
True if the file named in $a is a text file
-e "/usr/bin/perl" or warn "Perl is improperly installed\n"; -f "/vmlinuz" and print "I see you are a friend of Linus\n";
There are a lot more operators not listed - see the Perl man pages or Programming Perl etc.
More On Input Operators The command input operator ``. (Also known as backtick or qx//). The most heavily used input operator is <> (also called the diamond operator). Examples: while (defined($_ = <STDIN>)) { print $_; } while ($_ = <STDIN>) { print; } while (<STDIN>) { print; } for (;<STDIN>;) { print; } print $_ while defined($_ = <STDIN>); print while $_ = <STDIN>; print while <STDIN>;
# # # # # # #
the longest way explicitly to $_ the short way while loop in disguise long statement modifier explicitly to $_ short statement modifier
All of these lines Are equivalent
$_ is the default variable which is used implicitly (when you’re not explicit).
You can use the backtick operator to execute any system command like this: $info = `finger $user`; # Or - qx/finger $user/; The command will undergo variable interpolation - so the $user gets converted into a real user name, then the command is passed to the shell, and all output from the shell is passed back to the command and put into the variable $info. The numeric status of the command is stored in the Perl variable $?. If you need to pass a $ symbol to the shell then you’ll need to escape it with \, so the $user in our example is seen by Perl and not the shell. Be careful how you use <>. If you do this: $one_line
= <MYFILE>; # Get one line
@all_lines = <MYFILE>; # Get all lines - are you sure? If you just use <> without a file handle, then STDIN is assumed. So: $input = <STDIN>; and $input = <>; both do the same thing; read a line of input from STDIN. You can use this to advantage with Perl one-liners where STDIN is actually a pipe from a shell command like this (the $ is the shell prompt): $ cat myfile.pl | perl -e "while (<>) { print if m/^\s*sub/; };”
A Special Case Of Using <> Normally when you use the <> operator, you use it like this: my $line = <STDIN>; # Assign explicitly to a variable
There is one case where assignment is automatic: The <> operator is the only thing inside the conditional of a while() loop. If it is, then the input is assigned to $_. Used in writing Perl One-Liners. @ARGV = ('-') unless @ARGV; # assume STDIN if empty while (@ARGV) { $ARGV = shift @ARGV; # shorten @ARGV each time if (!open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while () { ... # code for each line } }
while (<>) { ... # code for each line }
This,
Does exactly the same as this.
Remember, this special “magic” requires that the only thing inside the while loop is the <> operator, if you use the <> operator anywhere else you must assign the result explicitly if you want to keep the value. LAB5 - FILES_1 LAB5 - FILES_2 LAB5 - FILES_3
The Range Operator .. Examples: 1: for (101 .. 200) { print; } 2: @foo = @foo[0 .. $#foo]; 3: @foo = @foo[ -5 .. -1];
# prints 101102...199200 # an expensive no-op # slice last 5 items
1: Uses $_ as the default value of the loop. 2: $#foo is the index of the last item in @foo - this is true for all arrays. 3: Using a negative subscript on an array counts backwards from the end of the array. If the left value is greater than the right value in a .. Command then a null list is returned. If what you really wanted was to count backwards then do this: for reverse ( 27 .. 56 ) { print; } # prints 565554 … 2827 4: When used with strings we get some magic - this gives all the uppercase letters in the English alphabet. The .. operator is false as long a its left operand is false. Once the left operand is true the .. operator is true until the right operand is true, then the .. operator becomes false again.
The Conditional Operator ?: Just like the C version. Is a trinary operator - it’s two parts separate three expressions like this: condition ? then : else Examples: $a = $ok ? $b : $c; @a = $ok ? @b : @c; $a = $ok ? @b : @c;
# get a scalar # get an array # get a count of an array's elements
printf "I have %d camel%s.\n", $n, $n == 1 ? "" : "s"; What this says is this (for the first example): Look at the value of $ok - if it’s true then $a = $b; otherwise $a = $c; Example:
$result = ( $count == 10 ) ? 88 : 99;
1st expression
2nd expression
3rd expression
The condition part is always evaluated in scalar context - for Truth or Falsity. Question: In the example - what will the value of $result be if $count is 12?
How Do I … Establish Default Values? You would like to give a default value to a variable, but only if it doesn’t already have one. $a = $b || $c;
More info: See The Perl Cookbook, section 1.2 Page 6. The difference between the two types of solution is what they test for - something being defined, or something being true. Three values which are defined are false. 0 “0” and “”. If a variable already held one of those values and you wanted to keep that value then || won’t work.
How Do I … Establish Default Values? You would like to give a default value to a variable, but only if it doesn’t already have one. # find the user name on Unix systems $user = $ENV{USER} || $ENV{LOGNAME} || getlogin() || (getpwuid($<))[0] || "Unknown uid number $<";
The first expression which is true is the result which is assigned to $user.
More info: See The Perl Cookbook, section 1.2 Page 6. LAB5 - FILE_4
Control Structures
Notes:
Control Structures - Truth We’ve seen that some operators return a true or false value. Here are the rules for the values a scalar can hold. 1. Any string is true except for “” and “0”. 2. Any number is true except for 0. 3. Any reference is true regardless of what it refers to. 4. Any undefined value is false. 0 # would become the string "0", so false. 1 # would become the string "1", so true. 10 - 10 # 10-10 is 0, would convert to string "0", so false. 0.00 # equals 0, would convert to string "0", so false. "0" # the string "0", so false. "" # a null string, so false. "0.00" # the string "0.00", neither "" nor "0", so true! "0.00" + 0 # the number 0 (coerced by the +), so false. \$a # a reference to $a, so true, even if $a is false. undef() # a function returning the undefined value, so false.
Notes
Loop Statements LABEL while (EXPR) BLOCK LABEL while (EXPR) BLOCK continue BLOCK LABEL until (EXPR) BLOCK LABEL until (EXPR) BLOCK continue BLOCK LABEL for (EXPR; EXPR; EXPR) BLOCK LABEL foreach (LIST) BLOCK LABEL foreach var (LIST) BLOCK LABEL foreach var (LIST) BLOCK continue BLOCK LABEL BLOCK LABEL BLOCK continue BLOCK
Continue BLOCKS are always optional
LABEL’s are always optional
All these statements have an optional LABEL. The while statements execute as long as EXPR is true. If while is replaced with until, then the sense of the test is reversed. Note that unlike some languages which have do - until loops, in Perl the until test is made at the start of the loop and not the end. It is customary to make the LABEL name be all uppercase. The while and until statement can have an optional continue block. This block is executed every time the block is continued either by falling off the end of the first block or by an explicit next (a loop-control operator which goes to the next iteration of the loop).
Loop Control We’ve already seen that a loop can have a label. It’s used with the loop control operators next, last, redo. The label names the loop as a whole - not the top of the loop. The loop control operator doesn’t “go to” the label. The syntax for the loop control operators is this: last LABEL next LABEL redo LABEL The last operator immediately exits the loop - any continue block is not executed. The next operator skips the rest of the current loop and starts the next one. If there’s a continue clause then it is executed. The redo operator restarts the loop block without evaluating the condition again. Any continue block is not executed.
The LABEL is optional - if it’s missing then the last, next, redo is the innermost enclosing loop. But if you want to jump out of nested loops then the LABEL is needed. Even though I’ve talked about continue blocks a lot - not many people use them.
Loop Control - An Example LABEL: while { # Code if ( something == TRUE ) { redo; } # Code if ( something == TRUE ) { next; } # Code if ( something == TRUE ) { last; } # Code } continue { # Code }
The LABEL is optional - if it’s missing then the last, next, redo is the innermost enclosing loop. But if you want to jump out of nested loops then the LABEL is needed.
Compound Statements - If And Unless A sequence of statements is called a BLOCK. Compound statements are built from expressions and BLOCKs. Blocks are always surrounded by { and }. if (EXPR) BLOCK if (EXPR) BLOCK else BLOCK if (EXPR) BLOCK elsif (EXPR) BLOCK .. if (EXPR) BLOCK elsif (EXPR) BLOCK .. else BLOCK
Note: it’s elsif NOT elseif. unless simply reverses the true/false value of if. Note that unless also works with else and elsif. There’s no such thing as elseunless.
Compound Statements - If And Unless Examples: unless ($x == 1) ... if ($x != 1) ... if (!($x == 1)) ...
Compound Statements - If And Unless Examples: unless (open(FOO, $foo)) if (!open(FOO, $foo))
{ die "Can't open $foo: $!" } { die "Can't open $foo: $!" }
die "Can't open $foo: $!" die "Can't open $foo: $!"
unless open(FOO, $foo); if !open(FOO, $foo);
open(FOO, $foo) open FOO, $foo
|| die "Can't open $foo: $!"; or die "Can't open $foo: $!";
chdir $dir open FOO, $file @lines = close FOO
or or or or
die die die die
"chdir $dir: $!"; "open $file: $!"; "$file is empty?"; "close $file: $!"; $! is the error code
I tend to prefer this
In the preferred example - there’s no if and no unless - we’re relying on the shortcircuit evaluation. $! Is the error code returned by a shell for open, chdir and close (and also for lots of other shell operations).
Control Structures - If And Unless Examples: if ($debug_level > 0) { # Something has gone wrong. Tell the user. print "Debug: Danger, Will Robinson, danger!\n"; }
if ($city eq "New York") { print "New York is northeast of Washington, D.C.\n"; } elsif ($city eq "Chicago") { print "Chicago is northwest of Washington, D.C.\n"; } else { print "I don't know where $city is, sorry.\n"; }
Note - if has else and elsif. unless does not have an elseunless.
Control Structures - If And Unless More examples - compare with the previous page: print "Danger, Will Robinson, danger!\n" if ($debug_level > 0); print "I'm not going home.\n" unless ( $destination eq $home );
Another example of idiomatic Perl. You’ll see the interchangeability of statements like this a lot.
Control Structures - While And Until Perl has four main looping constructs, while & until and for & foreach. While & until act like if and unless except that they loop repeatedly. 1. First the condition is checked. 2. If the condition is met, that is the condition is: 1. 2.
True for the while loop. False for an until loop.
3. Then the block of code is executed. while ($tickets_sold < 10000) { $available = 10000 - $tickets_sold; print "$available tickets are available. $purchase = <STDIN>; chomp($purchase); $tickets_sold += $purchase; }
How many would you like: ";
while ( $line = ) { ...
Note: If the original condition is never met then the loop is never entered. Make sure if you intend to leave the loop at some point that you have some code in the loop which changes the variable which keeps you going through the loop. The bottom example assigns the next line from the GRADES file to the variable $line and returns the value of the line so the condition of the while statement can be evaluated for truth. You might wonder if Perl will exit prematurely when it sees blank lines in the file - the answer is it won’t because a blank line is a “\n” or newline character and this is not false. When we do reach the end of the file the line input operator returns the value undef, which always evaluates to false and so at this point the loop does terminate. There’s no need for an explicit test because the input operator is set up to work smoothly in a conditional context.
While Loops
while (my $line = <STDIN>) { $line = lc $line; } continue { print $line; # still visible } # $line now out of scope here
A variable declared local to the while loop (here done with my $line) exists only inside the loop. If you want $line to be visible after the loop has ended then declare the variable before the loop begins. We’ll discuss scope shortly. Also, the use of a continue block here is redundant - we could have easily put all the statements in the continue block inside the main while loop. We’ll also discuss last,next and redo shortly.
Control Structures - While And Until You will often see command line arguments processed like this: while (@ARGV) { process(shift @ARGV); }
The shift operator removes one element from the argument list each time through the loop and sends it to a subroutine for processing (here called process()).
Control Structures - For And Foreach Examples: for ($sold = 0; $sold < 10000; $sold += $purchase) { $available = 10000 - $sold; print "$available tickets are available. How many would you like: "; $purchase = <STDIN>; chomp($purchase); }
foreach $user (@users) { if (-f "$home{$user}/.nexrc") { print "$user is cool... they use a perl-aware vi!\n"; } }
foreach $key (sort keys %hash) {...
Common Perl idiom for getting the keys from a hash.
The for loop takes three expressions. An initial expression - set only once, a condition to be tested every time the loop is executed and an expression to modify the loop variable. The foreach loop is used to iterate through the contents of an array. The foreach loop treats the expression in ( and ) as a list (this is list context) always - even if there’s only one element in the list. Then each element is aliased to the loop variable in turn - IMPORTANT - MODIFYING THE LOOP VARIABLE ALSO MODIFIES THE ORIGINAL ARRAY.
For Loops The for loop has three expressions: 1. An expression which initializes the loop. 2. A condition which will keep the loop executing, and 3. An expression which re-initializes the loop. All three expressions are optional - the “;” are not. If it’s missing - the condition is always true. So: LABEL: for (my $i = 1; $i <= 10; $i++) { } { my $i = 1; LABEL: while ($i <= 10) { } continue { $i++; } }
Notes:
These are equivalent
For Loop Examples Examples: for ($i = 0, $bit = 0; $i < 32; $i++, $bit <<= 1) { print "Bit $i is set\n" if $mask & $bit; } # the values in $i and $bit persist past the loop
for (my ($i, $bit) = (0, 1); $i < 32; $i++, $bit <<= 1) { print "Bit $i is set\n" if $mask & $bit; } # loop's versions of $i and $bit now out of scope
You can do more than one thing in the three parts of the loop. The <<= 1 part of the loop is shifting the value of $bit 1 bit to the right.
This is the usual way to get all of the keys out of a hash.
With foreach there isn’t any way to know where you are in a list (unless you decide to keep track of it yourself with counters etc.) If the list contains modifiable values (i.e. variables, not constants), then you can modify those variables by modifying the variable inside the loop. The variable in the loop is an alias for the variable in the list.
for ($scalar, @array, values %hash) { s/^\s+//; s/\s+$//; }
# strip leading whitespace # strip trailing whitespace
On the last slide we said that the variable inside the loop in a foreach loop was an implicit alias for the variable in the list which is passed to foreach. So when we alter the variable in the loop ($pay in the top example) we’re actually altering the variable in the list which we are reading through.
Control Structures - Breaking Out - Next & Last It’s not unusual to have special cases in loops. Next skips to the end of the loop and forces the next iteration. Last skips to the end of the loop and exits the loop. Example: foreach $user (@users) { if ($user eq "root" or $user eq "lp") { next; } if ($user eq "special") { print "Found the special account.\n"; # do some processing last; } }
Notes:
Control Structures - Breaking Out - Next & Last It’s possible to break out of nested loops by labeling your loops and specifying which loop you want to break out of. LINE: while ($line = ) { last LINE if $line eq "\n"; # stop on first blank line next LINE if $line =~ /^#/; # skip comment lines # your ad here }
A label Would anyone care to speculate On what this piece of code does?
Notes:
Case Statements Perl doesn’t have a case statement:.
But it’s simple to build one.
SWITCH: { if (/^abc/) { $abc = 1; last SWITCH; } if (/^def/) { $def = 1; last SWITCH; } if (/^xyz/) { $xyz = 1; last SWITCH; } $nothing = 1; } OR SWITCH: { /^abc/ && do { $abc = 1; last SWITCH; }; /^def/ && do { $def = 1; last SWITCH; }; /^xyz/ && do { $xyz = 1; last SWITCH; }; $nothing = 1; }
Perl doesn’t have a case/switch structure since it is so easy to build one. The SWITCH is a label (remember the convention that all labels are in upper-case), and not some Perl keyword we haven’t discussed yet. We haven’t covered do (it’s on the next page), but think of it as a dummy keyword which enables a statement (the bit between { and }) to be written. All three lines in the second statement are using short-circuit evaluation. The first thing on the line (reading from left to right) which is false makes the whole line false and all the statements following are not evaluated. Remember: in short-circuit evaluation it’s the first thing which is false in an && statement and the first thing which is true in an || statement which controls the flow of the program. It’s important to remember that once a short-circuit evaluation has enough information to determine truth/falsity, then none of the other possible clauses are evaluated. If those other clauses also do assignment then those assignments won’t happen.
The do (BLOCK) Construct # process to place all LFSR stage results in a single file while() { /LFSR\s\=\s(\w+)/ && do { print LFSRFILE “$1\n” }; $lastfile = $1; }
This is a way of grouping a lot of statements into a single block.
The do BLOCK executes a sequence of statements in the BLOCK and returns the value of the last expression evaluated in the BLOCK. It can be modified with a while or an until statement modifier. If so then Perl executes the BLOCK before it tests the loop condition. The do BLOCK itself does not count as a loop, so the loop control statements next, last, redo cannot be used to leave or restart the BLOCK.
The do (FILE) Construct If do can read the file but can’t compile it, it returns undef and sets an error message in $@.
# read in config files: system first, then user for $file ("/design/C6RAM/defaults/defaults.rc", "$ENV{HOME}/.someprogrc") { unless ($return = do $file) { warn "couldn't parse $file: $@" if $@; warn "couldn't do $file: $!" unless defined $return; warn "couldn't run $file" unless $return; } } If the file compiles and runs, the value returned is the value of the last expression evaluated.
If do can’t read the file it returns undef and sets $! to the error.
The do FILE form uses the value of FILE as a filename and executes the contents of the file as a Perl script. Its use is to include subroutines from a Perl subroutine library, but it has been superceded by use. It is still useful for loading things like configuration data into your program as shown in the example. If the file can be read but doesn’t compile then an error is set in $@. If the file can’t be read then an error is set in $!
Goto Perl does support goto - so that’s at least one thing they got wrong then! You can: goto LABEL goto Expression goto &name (subroutine)
Notes:
goto(("FOO", "BAR", "GLARCH")[$i]);
# hope 0 <= i < 3
@loop_label = qw/FOO BAR GLARCH/; goto $loop_label[rand @loop_label];
# random teleport
How Do I … Do Something With Every Element In A List? You want to repeat a procedure for every element in a list. foreach $item (LIST) { # do something with $item }
The code in the loop can call last to jump out of the loop, next to move on to the next element, of redo to jump back to the first statement inside the block.
More info: See The Perl Cookbook, section 4.4 Page 97. The variable set to each value in the list is called the loop iterator. If no variable is supplied then the global variable $_ will be used. $_ is the default variable used in many of Perl’s string, list and file functions.
How Do I … Do Something With Every Element In A List? You want to repeat a procedure for every element in a list. while () { chomp; foreach (split) { $_ = reverse; print;
# # # # # #
$_ is set to the line just read $_ has a trailing \n removed, if it had one $_ is split on whitespace, into @_ then $_ is set to each chunk in turn the characters in $_ are reversed $_ is printed
To be sure of what is happening it is always Perl’s $_ value is preserved better to declare and through any foreach nested use your own lexical loops variable The foreach construct has another feature: each time through the loop the iterator variable is an alias not a copy
More info: See The Perl Cookbook, section 4.4 Page 97. IMPORTANT NOTE: The top example works the way we might hope for. The value of $_ in the while loop is preserved when the foreach loop is executed. However, if the while loop had been the inner loop then BAD THINGS would have happened since the while construct clobbers the value of the global $_ (I.e. it doesn’t localize it). Consider this to be a bug or a feature - either way it’s an accident waiting to happen. See the full explanation on page 99 of the Perl Cookbook. I would always recommend using lexical variables. These are localized at their point of declaration and the risk of side-effects is much reduced. Also note that with a foreach loop, the loop iterator is not a copy of the variable from the list, it actually is the variable in the list - change the variable and it changes in the list. This is important - it’s not a copy, it’s an alias.
How Do I … Find Elements In One List But Not In Another? You want to find the elements which are in one list but not in another. # assume @A and @B are already loaded %seen = (); # lookup table to test membership of B @aonly = (); # answer # build lookup table foreach $item (@B) { $seen{$item} = 1 } # find only elements in @A and not in @B foreach $item (@A) { unless ($seen{$item}) { # it's not in %seen, so add to @aonly push(@aonly, $item); } }
Straight-forward version
More info: See The Perl Cookbook, section 4.7 Page 104. Solution: Build a hash of the keys in @B to use as a lookup table. Then iterate through @A looking to see if the item in @A is in the lookup table. If it is then it’s in both @A and @B. If it’s not then it’s in @B but not in @A.
How Do I … Find Elements In One List But Not In Another? You want to find the elements which are in one list but not in another. my %seen; my @aonly;
More info: See The Perl Cookbook, section 4.7 Page 104. The two different answers vary in how they build the hash. The first (previous slide) iterates over @B. This one uses a hash slice. A hash slice is built like this: $hash{“key1”} = 1; $hash{“key2”} = 2; This is equivalent to: @hash{“key1” , “key2”} = (1,2); The list in {} holds the keys while the list on the right holds the values. In this second example we say this: @seen{@B} = (); This uses the items in @B as keys for %seen, setting each to undef (because the list on the right is empty). We later check for the existence of the key - not the logical truth or the definedness of the value.
How Do I … Extract Unique Elements From A List? You want to remove duplicate elements from a list. %seen = (); @uniq = (); foreach $item (@list) { unless ($seen{$item}) # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }
Solution: Use a hash to record the values and then keys() to extract the values
More info: See The Perl Cookbook, section 4.6 Page 102. Solution: Use a hash to record which items have been seen and then use keys on the hash to extract them. Warning. Using a hash like this can use up a lot of memory, and once you’ve used a hash the keys function will return the keys in a random order (not the insertion order). If this matters then you need a different solution.
How Do I … Extract Unique Elements From A List? You want to remove duplicate elements from a list. # generate a list of users logged in, removing duplicates %ucnt = (); for (`who`) { s/\s.*\n//; # kill from first space till end-of-line, yielding username $ucnt{$_}++; # record the presence of this user } # extract and print unique keys @users = sort keys %ucnt; print "users logged in: @users\n";
More info: See The Perl Cookbook, section 4.6 Page 102.
How Do I … Reverse An Array? You want to reverse an array. # reverse @ARRAY into @REVERSED @REVERSED = reverse @ARRAY; for ($i = $#ARRAY; $i >= 0; $i--) { # do something with $ARRAY[$i] }
Solution: Use the reverse() function Solution: Use a for loop
More info: See The Perl Cookbook, section 4.10 Page 109. The reverse() function, reverses a list. The for loop actually processes the list in reverse order but keep the list in its original order. If you use reverse() to reverse a list you just sorted then make sure its in the order you want. The sort() function takes an optional code block which lets you replace the default alphabetic comparison subroutine with your own, This function is called each time sort() has to compare two values. The values are loaded into $a and $b which are automatically localised, so they won’t interfere with any variables you already have called $a or $b. The comparison function should return a negative number if $a should appear before $b in the output list, 0 if the order doesn’t matter and a positive number if $a should appear after $b in the output list. Perl has two operators that behave this way: <=> for sorting numbers in ascending order, and cmp for sorting strings in ascending alphabetic order. By default sort() uses cmp-style comparisons. Of course, you can always provide your own comparison subroutine.
How Do I … Traverse A Hash? You want to perform an action on each entry in a hash. while(($food, $color) = each(%food_color)) { print "$food is $color.\n"; } Banana is yellow. Apple is red. Carrot is orange. Lemon is yellow.
Solution: Use each() with a while loop
foreach $food (keys %food_color) { my $color = $food_color{$food}; print "$food is $color.\n"; } Banana is yellow. Apple is red. Carrot is orange. Lemon is yellow.
Solution: Use keys with a foreach loop
foreach cannot be used with hashes, nor can push(), pop(), shift, unshift()
More info: See The Perl Cookbook, section 5.4 Page 135.
WARNING
How Do I … Delete Something From A Hash? You want to remove an entry from a hash. # remove $KEY and its value from %HASH delete($HASH{$KEY});
Solution: Use the delete() function
Don’t try to delete a key by setting its value to undef. All that will do is set the keys value to undef! The delete function() is the only way to remove a specific hash entry. Once a key is deleted it will no longer show up in the list of keys(), or an each() iteration and exists() will return false for that key.
More info: See The Perl Cookbook, section 5.3 Page 133. You can’t delete a key by setting its value to undef since undef is a value which a hash can can store. You must use the delete() function. If you want to clear a hash then simply assign it to the empty list like this: %hash = ();
How Do I … Sort A Hash? You need to work with the elements of a hash in a particular order. # %HASH is the hash to sort @keys = sort { criterion() } (keys %hash); foreach $key (@keys) { $value = $hash{$key}; # do something with $key, $value }
More info: See The Perl Cookbook, section 5.9 Page 144. Solution: Get a list of keys and sort based on the ordering you want. Sort by default sorts alphabetically. The optional code block passed to sort will be called every time sort needs to compare two values in the sort function. $a and $b are localised sort variables.
How Do I … Test For The Presence Of A Key In A Hash? You need to know if a hash has a particular key. %age = (); $age{"Toddler"} = 3; $age{"Unborn"} = 0; $age{"Phantasm"} = undef; foreach $thing ("Toddler", "Unborn", "Phantasm", "Relic") { print "$thing: "; print "Exists " if exists $age{$thing}; print "Defined " if defined $age{$thing}; print "True " if $age{$thing}; print "\n"; } Toddler: Exists Defined True Unborn: Exists Defined Phantasm: Exists Relic:
Exists, defined, true Exists, defined Exists None of the above
More info: See The Perl Cookbook, section 5.2 Page 131. Toddler: It exists because we gave it a value in the hash, that value is defined (3) and since it’s non-zero, it is true. Unborn: It exists because we gave it a value in the hash, that value is defined (0) and since it’s zero it is not true. Phantasm: It exists because we gave it a value in the hash, that value is undefined so it fails the defined test and since undef is false it fails the truth test as well. Relic: It doesn’t exist since we never put it into the hash. So it fails all three tests.
How Do I … Invert A Hash? You have a hash and a value for which you want to find the corresponding key. # %LOOKUP maps keys to values %REVERSE = reverse %LOOKUP;
What happens if two different keys happen to have the same value? Result - The inverted hash will only have one. For a solution to this see the “Perl Cookbook” pages 140 and 141.
More info: See The Perl Cookbook, section 5.8 Page 142. Use reverse() to create an inverted hash whose values are the original hashes keys and whose keys are the original hashes values. When we treat %surname as a list it becomes: ("Mickey", "Mantle", "Babe", "Ruth"), or ("Ruth", "Babe", "Mantle", "Mickey"), because we can’t predict the order in which things come out of hashes. Reversing this list (assume the first list is the one we get) gives this: ("Ruth", "Babe", "Mantle", "Mickey") When we treat this list as a hash it becomes: ("Ruth" => "Babe", "Mantle" => "Mickey")
How Do I … Test For The Presence Of A Key In A Hash? You need to know if a hash has a particular key. # does %HASH have a value for $KEY ? if (exists($HASH{$KEY})) { # it exists } else { # it doesn't }
Solution: Use the exists() function
# %food_color per the introduction foreach $name ("Banana", "Martini") { if (exists $food_color{$name}) { print "$name is a food.\n"; } else { print "$name is a drink.\n"; } } Banana is a food. Martini is a drink.
More info: See The Perl Cookbook, section 5.2 Page 131. exists() checks for the existence of a key in a hash. It doesn’t say anything about the keys value (if the key exists).
How Do I … Print A Hash? You want to print a hash, but neither print “%hash” nor print %hash works. while ( ($k,$v) = each %hash ) { print "$k => $v\n"; }
Solution: Iterate using each()
print map { "$_ => $hash{$_}\n" } keys %hash;
Solution: Use map to generate a list of strings
print "@{[ %hash ]}\n";
Solution: Interpolate the hash as a list and print that
{
Solution: Use a temporary array to hold the hash and print that
You can print in key order at the cost of doing a sort()
More info: See The Perl Cookbook, section 5.5 Page 137. The best solution is probably the first one.
How Do I … Delete Something From A Hash? You want to remove an entry from a hash. # %food_colour as per Introduction sub print_foods { my @foods = keys %food_colour; my $food; print "Keys: @foods\n"; print "Values: ";
Initially: Keys: Banana Apple Carrot Lemon Values: yellow red orange yellow With Banana undef Keys: Banana Apple Carrot Lemon Values: (undef) red orange yellow With Banana deleted Keys: Apple Carrot Lemon Values: red orange yellow
More info: See The Perl Cookbook, section 5.3 Page 133. You can’t delete a key by setting its value to undef since undef is a value which a hash can store. You must use the delete() function. As the example shows, setting $food_colour{“Banana”} to undef doesn’t delete the key from the hash - it only makes the value undef. delete() really does remove it from the hash. delete() can also work with a hash slice to remove multiple keys from a hash, like this: delete @food_color{"Banana", "Apple", "Cabbage"};
How Do I …Merge Hashes? You need to make a new hash with the entries of two existing hashes. %merged = (%A, %B); %merged = (); while ( ($k,$v) $merged{$k} } while ( ($k,$v) $merged{$k} }
= each(%A) ) { = $v;
Solution: Treat the hashes as lists and join them as you would lists. Keys which appear in both hashes will only appear once in the final hash. Alternative: Loop over the hashes elements and build a new hash.
= each(%B) ) { = $v;
More info: See The Perl Cookbook, section 5.10 Page 145.
How Do I … Traverse A Hash? You want to perform an action on each entry in a hash. while(($key, $value) = each(%HASH)) { # do something with $key and $value }
Solution: Use each() with a while loop
foreach $key (keys %HASH) { $value = $HASH{$key}; # do something with $key and $value }
Solution: Use keys with a foreach loop
More info: See The Perl Cookbook, section 5.4 Page 135. The each() function returns a two element list from the hash each! time it is called. Remember, order has no meaning in hashes, so regardless of the order with which you put values into the hash, it is very unlikely that they will come back out in that same order. It is possible to retrieve items in insertion order, but that is beyond the scope of this course.
How Do I … Find The Most Common Anything? You want to know how many times a value in an array or in a hash occurs in the array or hash. %count = (); foreach $element (@ARRAY) { $count{$element}++; }
Solution: Use a hash to count how many time each element (for an array) or key (for a hash) occurs. The foreach adds one to $count{$element} for every occurrence of $element.
More info: See The Perl Cookbook, section 5.14 Page 150.
How Do I … Operate On A Series Of Integers? You want to perform an operation on a series of integers between X and Y. foreach ($X .. $Y) { # $_ is set to every integer from X to Y, inclusive }
Range operator
foreach $i ($X .. $Y) { # $i is set to every integer from X to Y, inclusive }
Range operator
for ($i = $X; $i <= $Y; $i++) { # $i is set to every integer from X to Y, inclusive } for ($i = $X; $i <= $Y; $i += 7) { # $i is set to every integer from X to Y, stepsize = 7 }
Remember, for and foreach are synonyms, so that gives us another 4 variations
More info: See The Perl Cookbook, section 2.5 Page 49. Solution: use a for loop or a foreach with the range operator (..) When iterating over consecutive integers, the third method is most efficient.
Regular Expressions
Notes:
Regular Expressions Regular expressions (a.k.a. regexes, regexps, RE’s) are used in: grep awk findstr sed vi Emacs A regular expression is a way of describing a set of strings without saying what they all are. if (/Windows 95/) { print "Time to upgrade?\n" }
s/Windows/Linux/;
Be careful - regular expressions in Perl are not identical to regular expressions in other languages. When you see something that looks like /foo/ you’re looking at a pattern match operator (the / and the /). If you can find patterns in a string then you can also replace those patterns with something else. So when you see something like s/Windows/Linux/ you’re looking at a substitution of Linux for Windows (which some people might say is a good thing)! Finally patterns can also specify where something isn’t. This is used with the split operator - see next slide.
Regular Expressions An example of the split operator: ($good, $bad, $ugly) = split( /,/ , "vi,emacs,teco");
This is the list which gets the results of the split operator
This is the string which split uses to chop up the list on its right (the comma between / and /
This is the text which split operates on
Tip - the best way to split a string which contains lots of white space: @words = split( /\s+/ , $line );
We haven’t covered the \s character class yet - but it stands for any white-space character. The \s+ means any string containing one or more consecutive white-space characters (it can be different numbers at different places on a line of text - the fields on which the split occurs don’t all have to be the same length).
Regular Expressions The simplest regular expressions are those which match several characters in a row: while ($line = ) { if ($line =~ /http:/) { print $line; } } This uses $_ for both the while () { print if /http:/; }
input operator and the string to search for a pattern match
while () { print if /http:/; print if /ftp:/; print if /mailto:/; # What next? }
In the first example we’re looking for all lines containing /http:/ exactly. The =~ operator is called the binding operator. It’s telling Perl to look for a match in the variable $line. If we don’t use the =~ operator then Perl by default searches the system variable $_. This is a special scalar variable which is used in many places in Perl - not just pattern matching. In the second example we’re using the default value $_ (which is also set by the <> operator). In the third example we’re looking for lots of different types of links, http, ftp, mailto. What happens if this later needs to be extended. Wouldn’t it be easier to look for any number of alphabetic characters followed by a colon?
Regular Expressions In regular expression speak that would be: /[a-zA-Z]+:/
The [ and ] define a character class. The a-z and A-Z represent all the alphabetic characters (the - means all characters between the starting and ending character inclusive). The + means “one or more of whatever is immediately in front of me”. That’s an example of a quantifier - something which says how many times something is allowed to repeat. Remember the / and / are not part of the pattern. Thery’re like quotes in that they contain the pattern but are not part of it.
Regular Expressions - Character Classes These are some common Perl quantifiers. Name
ASCII definition
Code
Whitespace
[ \t\n\r\f]
\s
Word character
[a-zA-Z_0-9]
\w
Digit
[0-9]
\d
Note that these match single characters. A \w will match a single word character - not a word. You can say \w+ to match a word. Perl also allows negation of these classes by using upper case character version of a quantifier. \D matches a non-digit character etc. There’s one special character class, written with a “.” that will match any character.
Example: /a./ will match any string containing an “a” that is not the last character in a string. Why? So this will match “at” or “am” or “a!” but not “a” since there’s nothing after the “a” for the dot (any character) to match with. It’ll also match “camel” and “oasis”, but not “sheba”. It matches “caravan” on the first “a”.
Regular Expressions - Quantifiers The character classes we’ve seen so far all match one character. You can match a word with \w+ and the “+” is one kind of quantifier. General quantifiers are like this: {min,max} Example
Matches
\d{6,8}
Any number of between 6 and 8 digits
\d{5,5}
A number of exactly 5 digits
\d{5,}
A number of 5 digits or more
\d{,5}
A number of 5 digits or less
Code
Meaning
+
{1,}
*
{0,}
?
{0,1}
Be very, very careful using “*”. Why?
Regular Expressions - Quantifiers Exercise: What does this do, i.e. what will be in $line after the substitution? $line = "Fred xxxxxxxx barney"; $line =~ s/x*//; print $line;
One last thing: Quantifiers apply to the immediately preceding character, so: /bam{2}/ will match "bamm" but not "bambam"
To apply a quantifier to more than one character, use ( and ) like this: /(bam){2}/ will match "bambam"
One other thing to note: all matching in Perl is greedy - Perl will match as much as it can
Regular Expressions - Anchors Examples: /\bFred\b/ would match in
Answer And Reason
"The Great Fred"
Yes
"Fred The Great"
Yes
"Frederick The Great"
No - Fred is not followed by a non-word character.
There are also characters for matching at: Start of line “^”. End of line “$”. (Don’t worry, Perl won’t confuse this with a variable instance). So when we said: next LINE if line =~ /^#/;
What were we saying?
When you try to pattern match, Perl will try to match in every location until it succeeds. An anchor allows you to specify where a pattern can match. The special symbol \b matches on a word boundary which is defined as the “nothing” which exists between a word character “\w” and a non-word character “\W”. Answer: Go to the next iteration of the loop if the first character on a line is the “#” character. Also, when we said that the sequence \d{6,8} would match a number of between 6 and 8 digits - that wasn’t quite true, since it would also match any number containing 9 or more digits as well. To get the desired result we would have to combine quantifiers with anchors. Exercise: write a pattern which will match a number of 5 or 6 digits - but will fail to match one of more than 6 digits.
Regular Expressions - Back References Use ( and ) to remember bits of patterns which match. Example:
/\d+/ Both these patterns match the same thing - a number
/(\d+)/
But this one remembers what was matched
What does this do? s/(\S+)\s+(\S+)/$2 $1/
When you match patterns you can use “(“ and “)” to remember the bits of a string which did match. The “(“ and “)” don’t change what matches. How you remember what was matched depends on where you want to remember it from. Inside the same pattern the bits of pattern which match are stored in variables \1 \2 \3 etc. The match from the first pair of “(“ and “)” is in \1 and so on. Outside the pattern the bits of pattern which match are stored in $1 $2 $3 etc. Be careful - once you start a new pattern match the old values of $1 $2 $3 etc. are all wiped out, so if you want to remember them long-term then copy $1 $2 $3 etc. into new variables. By the way - there’s no limit to how many bits of the pattern can be remembered, once you get to \9 or $9 Perl continues with \10 and $10 and so on. Whoops - no easy answer here this time - you’ll have to work it out.
Earlier we mentioned the terms scalar and array context. So far most things have been in scalar context - we’ve seen single results. Lots of Perl operators can produce either scalar results or list results. It depends on how they are used. They just “know” what is expected of them. In the first example @array is a four element list. In the second example each of @dudes, @chicks and other() returns a list, all the lists are then joined together to produce a single (big) list and that is passed to sort(). Some operators produce lists (like keys), while some consume them (like print). You can stack several up several list operators in a row - see example 3. This takes all the keys from %hash, turns them all into lower-case by applying the lc operator (via map { }), passes that list to the sort function and then passes that list to the reverse function which then (finally) prints that list. If you do a pattern match in list context then all the back-references are pulled out as a list - see example 4 and example 5. TMTOWTDI.
How Do I … Parse Comma-Separated Data? You have a file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them. sub parse_csv { my $text = shift; # record containing comma-separated values my @new = (); push(@new, $+) while $text =~ m{ # the first part groups the phrase inside the quotes. # see explanation of this pattern in MRE "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push(@new, undef) if substr($text, -1,1) eq ','; return @new; # list of values that were comma-separated }
This procedure is from “Mastering Regular Expressions”
use Text::ParseWords; sub parse_csv { return quoteword(",",0, $_[0]; }
Use the standard ParseWords module
More info: See The Perl Cookbook, section 1.15 Page 31. Comma-separated data sounds simple to parse, but it is actually a complex format since the fields themselves can contain commas. This makes the pattern matching solution complex and rules out a simple split /,/. Text::ParseWords hides all this complexity from you. Pass its quoteword() function two arguments and a CSV string. The first argument is the separator (in this case a comma); the second is a value which is true or false, and which controls whether the strings returned have quotes around them.
How Do I … Check If A String Is A Valid Number? You want to check if a string contains a valid number.
if ($string =~ /PATTERN/) { # is a number } else { # is not } warn warn warn warn warn warn warn
General solution Specific solutions
"has nondigits" if /\D/; "not a natural number" unless /^\d+$/; # rejects -3 "not an integer" unless /^-?\d+$/; # rejects +3 "not an integer" unless /^[+-]?\d+$/; "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
More info: See The Perl Cookbook, section 2.1 Page 44. This is something which is common when validating input as part of a CGI script. The solution is easy as long as you can decide what you mean by a number, and can then write a regular expression (or series of expressions) to look for the pattern you desire. If numbers can have leading or trailing space then a substitution to remove that space should occur, like this: $probable_number = s/\s+//g;
How Do I … Copy And Substitute Simultaneously? You want a easy way in pattern matching of copying and substituting at the same time. $dst = $src; $dst =~ s/this/that/;
You want to avoid this
($dst = $src) =~ s/this/that/;
So do this
# Make All Words Title-Cased ($capword = $word) =~ s/(\w+)/\u\L$1/g; # /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1 ($catpage = $manpage) =~ s/man(?=\d)/cat/; ($a = $b) =~ s/x/y/g; # copy $a and then change $b $a = ($b =~ s/x/y/g); # change $b, count goes in $a
More info: See The Perl Cookbook, section 6.1 Page 164.
How Do I … Match Only Letters When Pattern Matching? You want to see whether a value consists on only alphabetic characters. if ($var =~ /^[A-Za-z]+$/) { # it is purely alphabetic }
Use this if you don’t care about locale
use locale; if ($var =~ /^[^\W\d_]+$/) { print "var is purely alphabetic\n"; }
Use this if you do care about locale
More info: See The Perl Cookbook, section 6.2 Page 165. The obvious way of doing this isn’t good enough in the general case since it doesn’t respect a users locale setting. If you need to match letters with diacritical marks, then use something like the second example which matches against a negated character class. The \w regular expression matches one alphabetic character, one numeric character or _. Therefore \W is not one of those. The negated character class [^\W\d_] specifies a byte which must not be alphanumeric, a digit, or an underscore. That leaves nothing but alphabetics.
How Do I … Match Only Words When Pattern Matching? You want to pick out words from a string. /\S+/ /[A-Za-z'-]+/ /\b([A-Za-z]+)\b/ /\s([A-Za-z]+)\s/
# as many non-whitespace bytes as possible # as many letters, apostrophes, and hyphens
Probably what I would choose
# usually best # fails at ends or w/ punctuation
You need to decide what you want a word to be, and then write a pattern to detect it. For example, is sheep-shearing a word? What about Shepherd’s?
More info: See The Perl Cookbook, section 6.3 Page 167. What you mean by a word varies between languages. Perl doesn’t have a built-in definition of what a word is. You must make them from character classes and quantifiers. There is no simple, straight-forward answer to this question, so be careful.
How Do I … Comment Regular Expressions? You want to comment regular expressions. # Find duplicate words in paragraphs, possibly spanning line boundaries. # Use /x for space and comments, /i to match the both `is' # in "Is is this ok?", and use /g to find all dups. $/ = ""; # paragrep mode while (<>) { while ( m{ \b # start at a word boundary (\w\S+) # find a wordish chunk ( \s+ # separated by some whitespace \1 # and that chunk again ) + # repeat ad lib \b # until another word boundary }xig ) { print "dup word '$1' at paragraph $.\n"; } }
xig
More info: See The Perl Cookbook, section 6.4 Page 168. Use the /x modifier. This will cause the regular expression engine to ignore most whitespace inside a regular expression and will also allow for the insertion of comments. The allowed whitespace is space, tabs, and newlines.
How Do I … Find The Nth Occurrence Of A Match? You want to find the Nth match in a string, not just the first one. Input: One fish two fish red fish blue fish $WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } }
Example: Find the word preceding the third occurrence of “fish”. Use the /g modifier in a while loop and keep count of the number of matches.
The third fish is a red one. /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
Use a repetition count and a repeated pattern
More info: See The Perl Cookbook, section 6.5 Page 170. The /g modifier creates a progressive match which can be used in a while loop. To find the Nth match, it’s easiest to keep your own counter and then whenever you reach the count you want, do whatever is appropriate.
How Do I … Read Records With A Pattern Separator? You want read in records separated by a pattern. undef $/; @chunks = split(/pattern/, );
Solution: Read in the whole file and use split().
# .Ch, .Se and .Ss divide chunks of STDIN { local $/ = undef; @chunks = split(/^\.(Ch|Se|Ss)$/m, <>); } print "I read ", scalar(@chunks), " chunks.\n";
Create a localised copy of $/ which will be restored after the code finishes. By using split with () we also get the captured separators returned in the final array.
An example: The input stream is a text file that consists of lines separated by “.Ch”, “.Se”, and “.Ss”, which are codes used in troff. We want to find the text that falls between them.
More info: See The Perl Cookbook, section 6.7 Page 176. Example 1: (Note: $/ is Perl’s input record separator). $/ cannot be a pattern - it must be a fixed string. To get round this we undefine $/ so that the next read operation gets the whole of the rest of the file. Then we split that huge string using whatever pattern we choose.
How Do I … Read A Range Of Lines? You want read all lines from one starting pattern to an ending pattern. while (<>) { if (/BEGIN PATTERN/ .. /END PATTERN/) { # line falls between BEGIN and END in the # text, inclusive. } }
Solution: use the range operator
while (<>) { if ($FIRST_LINE_NUM .. $LAST_LINE_NUM) { # line is between BEGIN and END # inclusive. } } }
Solution: use the range operator
You don’t need to keep track of any line numbers in your code, Perl is doing it for you.
More info: See The Perl Cookbook, section 6.8 Page 177. Solution: Use the range operator .. Either with patterns or with line numbers. Here’s a very interesting Perl one-liner which makes use of this feature: perl -ne ‘print if 23 .. 72’ any_old_file.txt Will print out just lines 23 to 72 of the file shown.
How Do I … Match From Where The Last Pattern Left Off? You want to match again from where the last pattern left off. while (/(\d+)/g) { print "Found $1\n"; }
Solution: Use a combination of the /g modifier, the \G pattern anchor and the pos function.
Use \G to anchor the next match to the end of any previous match.
More info: See The Perl Cookbook, section 6.14 Page 190. If you use the /g pattern modifier, the Perl regular expression engine keeps track of its position when it finishes matching. The next time you match with /g the engine starts looking for a match from the remembered position. This lets you use a while loop to extract the information you want from the string.
How Do I … Match From Where The Last Pattern Left Off? You want to match again from where the last pattern left off. $_ = "The year 1752 lost 10 days on the 3rd of September"; while (/(\d+)/gc) { print "Found number $1\n"; }
Find all the numbers.
if (/\G(\S+)/g) { print "Found $1 after the last number.\n"; }
Now find what follows the last number.
Found Found Found Found
numeral 1752 numeral 10 numeral 3 rd after the last number.
More info: See The Perl Cookbook, section 6.14 Page 190. By default, when your match fails (say when you run out of numbers in the example above), the remembered position is reset to the start. If you don’t want this to happen because you want to carry on matching then use the /c modifier with /g. This pattern: /\G(\S+)/g will find whatever non-whitespace characters follow the last number (rd, in this case).
How Do I … Expand And Compress Tabs? You want to convert the tabs in a string into the appropriate number of spaces, or vice-versa. while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) { # spin in empty loop until substitution finally fails }
1
use Text::Tabs; @expanded_lines = expand(@lines_with_tabs); @tabulated_lines = unexpand(@lines_without_tabs);
2
while (<>) { 1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; print; }
3
use Text::Tabs; $tabstop = 4; while (<>) { print expand($_) }
4
More info: See The Perl Cookbook, section 1.7 Page 15. 1. Either use a funny looking substitution. 2. Use the standard Text::Tabs module. 3. 1 while (CONDITION) is the same as while (CONDITION} { # Code }. 4. Use the standard Text::Tabs module. LAB6 - REGEXP_1 LAB6 - REGEXP_2
Scope, Pragmas, Modules, Subroutines, References
Notes:
Scope What do we mean by scope? Variables are visible from the point at which they are defined. Private versus Public: foreach my $pw @password_list { my $pw_length = length( $pw );
my ( $pw , $pw_length ); foreach $pw @password_list {
if ( $pw_length < 8 ) { print "$pw is too short\n"; }
$pw_length = length( $pw ); if ( $pw_length < 8 ) { print "$pw is too short\n"; }
} # $pw and $pw_length don’t exist # here
} # $pw and $pw_length do exist # here
Scope means whether a variable is temporary/permanent and private/public. By default (if you do nothing at all) Perl’s variables are global and permanent (Later we’ll see that these are called package variables). Makes writing short programs very easy, but they can be difficult to debug. In both cases we have forced all variables to be declared before they are used (using my) - that doesn’t affect the code. The point is that in the left example $pw and $pw_length only exist in this piece of code. In the right example the same two variables exist after the code is finished executing. Subroutine declarations are global declarations - wherever you place them they are visible to all code in your package.
Pragmas A special kind of module that affects how your program is compiled. Invoked by a use or a no. Example: use strict; use integer; { no strict 'refs'; no integer; # .... }
Notes:
# allow symbolic references # resume floating point arithmetic
Pragmas use constant; use use use use use use use
sub deg2rad { PI * $_[0] / 180 } print "This line does nothing” unless DEBUGGING;
You can’t define more than one constant at a time. By convention all constants are defined in upper-case.
Pragmas use integer; use integer; $x = 10/3; # $x is now 3, not 3.33333333333333333
use integer; $x = 1.8; $y = $x + 1; $z = -1.8;
This pragma tells the compiler to use integer arithmetic only from now to the end of the enclosing block. In the second example you’ll be left with $x == 1.8, $y == 2 and $z == -1. The case for $z is special since the - sign in front of the 1.8 counts as an operation (unary minus) so the value of 1.8 is truncated to 1 before its sign bit is flipped.
Pragmas use lib; #!/usr/bin/perl -w use lib ( "/design/analog/software/Modules" ); use strict; use Carp; use English; use My_Constants; use Netlist_Functions; use use use use use
This is used to modify the list of places in which Perl will look to find library modules. It’s roughly equivalent to adding to your Unix $path variable. The strict, Carp and English modules are all standard Perl modules. Perl always knows how to find these. The modules My_Constants, Netlist_Functions, Mosfet, Capacitor, Resistor, Diode and Instance are all imported from our user defined directory. Parameters to use lib; are prepended to Perl’s search path.
Pragmas use strict; use strict;
# Install all three strictures.
use strict "vars"; use strict "refs"; use strict "subs";
# Variables must be predeclared. # Can't use symbolic references. # Bareword strings must be quoted.
use strict; no strict "vars";
# Install all... # ...then renege on one.
use strict 'subs'; $x = whatever; $x = whatever();
# WRONG: bareword error! # This always works, though.
sub whatever; $x = whatever;
# Predeclare function. # Now it's ok.
This pragma changes what Perl considers to be legal code. Sometimes these strictures seem too strict for casual programming - until you spend an hour looking for a bug which wouldn’t have happened if you’d used this pragma. There are three things we can be strict about: subs, vars, and refs. Symbolic references are suspect for a lot of reasons - its pretty easy to use one even when you don’t mean to. With this stricture in effect you can only use real or hard references. So, what are symbolic references? Strict vars will trigger a compile time error if you attempt to access a variable which has not met one of the following criteria: 1. Predefined by Perl self (i.e. a built-in variable). 2. Declared with our (for a global) or my (for a lexical). 3. Imported from another package. 4. Fully qualified using its package name and the :: package separator.
Standard Modules
Carp Cwd English Exporter
-
Report errors from a users perspective. Finds the current working directory. Allows use of English variable names. Determines what a module exports.
There are lots of other modules - see Chapter 32 of “Programming Perl”.
Carp lets you report errors from the perspective of a user, so if a user fails to use your modules correctly, the error messages will show up not as problems in your code (which of course you’ve thoroughly debugged), but in the users code. In other words this is a blame shifter. Cwd is a module which lets you find out the current working directory - for Unix this isn’t too useful since you can always use $cwd = `pwd`; However, this is guaranteed to work on all systems where Perl is installed even when they don’t have a shell function which will let them do $cwd = `pwd`; English lets you use English names instead of the standard Perl names for built-in variables. Exporter is used with modules to determine what subroutines can be seen from the outside of the module.
Subroutines Syntax: To declare a named subroutine without defining it do one of these. sub sub sub sub
NAME NAME PROTO NAME ATTRS NAME PROTO ATTRS
To declare and define a named subroutine, add a BLOCK: sub sub sub sub
NAME BLOCK NAME PROTO BLOCK NAME ATTRS BLOCK NAME PROTO ATTRS BLOCK
This all looks pretty complicated - but this is normally how we do things.
sub say_hello { print "Hello world.\n"; } say_hello();
A subroutine is a small self-contained sub-program. It is Invoked by its name, it may have arguments passed to it and it can return a scalar or a list value. It’s defined using the sub keyword followed by the subroutine code in {}. Subroutines can be defined anywhere in your program, loaded in from other files via do, require or use, or generated at run time with eval. You can call a subroutine directly, indirectly through a variable containing either its name or a reference to the subroutine, or through an object letting the object determine which subroutine should really be called. To create an anonymous subroutine just leave out the name. PROTO and ATTRS stand for prototype and attributes respectively - they’re not so important. NAME and BLOCK are essential even when they’re missing. For forms without the name you need to have some way to call the subroutine, so do this: $subref = sub BLOCK; And then later on you can say: &$subref;
Subroutines The function return causes execution of the subroutine to finish. The value specified after the return is returned as the result. Using a return statement is optional (but it shouldn’t be). If one isn’t used, then the value returned is the value of the last statement executed. @sorted = dictionary_order( “eat” , “at” , “Joes” ); @sorted = dictionary_order( @unsorted ); @sorted = dictionary_order( @sheep , @goats , “shepherd” , $goatherd ); sub get_next { return <>; } prompt(); $next = get_next();
# always okay since () # always okay since ()
prompt; $next = get_next;
# error - hasn’t seen definition yet # okay: get_next definition already seen
sub prompt { print “next> “; }
Just as in previous examples, the lists passed to a subroutine are all flattened. So the third call to dictionary_order would contain the contents of the array @sheep, followed by the contents of the array @goats, the value of “shepherd” and finally the scalar value stored in $goatherd. It is possible to pass two or more arrays to a subroutine and have them maintain their integrity (i.e. keep them unflattened). If the subroutine does not require arguments then it can be passed an empty argument list. The list can also be missed completely as long as Perl knows it’s a subroutine. Like variables, subroutines have a leading symbol which indicates what they are. The name of a subroutine is preceded by an & which may be used when calling it. It must be used when calling a subroutine in certain contexts (we’ll see these in a minute). It can’t be used when defining the subroutine however. So this won’t work: sub &dictionary_order { return sort @_; }
# FATAL Compile Time Error
Other Ways To Call Subroutines Subroutines which have been defined earlier can be called without “(“ and “)”. sub make_sequence # from, to, step_size { @list = (); for ( $n = $_[0] ; $n < $_[1] ; $n += $_[2] ) { push @list , $n; } return @list; } @stepped_sequence = make_sequence $min , $max , $step_size; &my_subroutine; # Means my_subroutine( @_ ); my_subroutine; # Means my_subroutine();
Arguments passed to a subroutine are available via the @_ array. Example 1: A subroutine already defined can be called without the “(“ and “)” around the argument list. Example 2: Another way to call a subroutine is to use the & prefix but without passing any arguments. In this case the subroutine has the value of the @_ array passed to it instead. This is used to call subroutines from within other subroutines. This is almost never used in new code but may be present in old code. Always use subroutines as shown in the style section of this course.
Named Subroutine Arguments Suppose we had a subroutine which took a lot of arguments: ls( “*” , “any” , 1 , 1 , 0 , 0 , “alpha” , 4 , 1 ); ls( undef, undef , 1 , 1 , undef , undef , undef , 4 , 1 ); ls( cols => 1 , pages => 4 , width => 80 ); sub ls { %arg = @_; # convert a list to a hash $arg{ pages } = “*” unless exists $arg{ pages }; $arg{ cols } = 1 unless exists $arg{ cols }; #etc }
Example 1: You don’t want to pass 9 arguments to this subroutine when only a few are going to change. Example 2: You could arrange that passing undef as a parameter chooses a default value but we’d still have to write a long piece of code as shown. Example 3: Perl supports named parameters for arguments by passing a hash to a subroutine rather than an array. We can use the => operator to associate a name with each argument. Inside a subroutine we initialise a hash with the contents of the @_ array. This documents the call better and since the entries of a hash can be initialised in any order we don’t need to remember the order of parameters in the call.
Named Subroutine Arguments (Continued)
%std_listing = ( cols => 2 , pages => 4 );
Set up some defaults
ls ( files => “*.txt” , %std_listing ); ls ( files => “*.log” , %std_listing ); ls ( files => “*.hlp” , %std_listing ); ls ( files => “*.dat” , %std_listing , cols => 8 );
Use the defaults Override some of the defaults
In the first example we set up some default values for some arguments. In the second set of examples we use the standard set of parameters. In the third example we use a default set of arguments and then override some of that standard set as well.
Aliasing Of Parameters - Pass By Reference #!/usr/bin/perl -w use strict; my $line = “Mary had a little”; my $animal = “lamb”; Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” Print_Rhyme( $line , $animal ); # prints “Mary had a little dog” exit; sub Print_Rhyme # Parameters passed in @_ as aliases { print $_[0] . “ “ . $_[1] . “\n”; $_[1] = “dog”; return 0; }
In this code we pass the parameters in @_ (this is always true) and use them in the subroutine as aliases. Therefore when we change the value of one or more of the parameters in the subroutine we are actually changing them in the calling code as well. Therefore $_[1] = “dog”; has the effect of saying that my $animal = “dog”; on line 6. This is nearly always *NOT WHAT YOU WANT*
Aliasing Of Parameters - Pass By Value #!/usr/bin/perl -w use strict; my $line = “Mary had a little”; my $animal = “lamb”; Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” exit; sub Print_Rhyme { my ( $line , $animal ) = @_;
# Parameters passed in @_ and copied # into local variables
This change is isolated to the Print_Rhyme subroutine.
}
In this code we pass the parameters in @_ (this is always true) and use them in the subroutine as values by copying them into local variables. Therefore when we change the value of one or more of the parameters in the subroutine the change is restricted to the values of the local variables in the subroutine. Therefore the assignment: $animal = “dog”; has no effect on the calling code - it is localised in the Print_Rhyme subroutine. This is the way you should use subroutines.
A Standard Way Of Using Subroutines sub _interpolate_value { my ( $t1 , $v1 , $t2 , $v2 , $time ) = @_; croak( "No t1 value in Waveform::_interpolate_value()" ) unless defined( $t1 ); croak( "No v1 value in Waveform::_interpolate_value()" ) unless defined( $v1 ); croak( "No t2 value in Waveform::_interpolate_value()" ) unless defined( $t2 ); croak( "No v2 value in Waveform::_interpolate_value()" ) unless defined( $v2 ); croak( "No time in Waveform::_interpolate_value()"
) unless defined( $time );
if ( $t1 == $time ) { return( $v1 ); } if ( $t2 == $time ) { return( $v2 ); } my $delta_t = $t2 - $t1; my $delta_v = $v2 - $v1; croak ( "Error - divide by zero in Waveform::_interpolate_value()" ) if ( $delta_t == 0 ); my $dv_by_dt = $delta_v/$delta_t; my $interpolated_value = $v1 + ( $time - $t2) * $dv_by_dt; return( int $interpolated_value ); }
Elements of the @_ array are special. They are not copies of the actual arguments. They are aliases to the actual arguments. If values $_[0], $_[1] etc. are changed then the argument in the calling routine is changed, i.e the parameters in this case are passed by reference. This behavior is useful but can lead to hard to find bugs. Would prefer to be able to pass by value - this is the more usual form, so explicitly copy the @_ array into a new array, and to be doubly safe make the receiving array a my() array. The above code is a fragment of an object-oriented program. The _ at the front of the subroutine name is a convention for internal subroutines in OO code - it’s a subroutine called only from within the object. croak() is a subroutine defined with: use Carp; It corresponds to die().
Subroutine Calling Context When a subroutine is called it is possible to detect whether it was expected to return a scalar, a list or nothing at all. The contexts in which a subroutine is called are: ls ( @files );
# void context: no return value expected
$listing = ls( @files );
# scalar context: scalar return value expected
@missing = ls( @files ); # list context: list return value expected ($f1 , $f2 ) = ls( @files ); # list context: list return value expected print ( ls( @files ) );
# list context: list return value expected
The information about the calling context is obtained from the wantarray function. The function returns: undef (false and undefined) if the subroutine was called in void context. “” (false and defined) if the subroutine was called in scalar context. 1 (true and defined) if the subroutine was called in list context. We could use his information to decide what value a subroutine needs to return.
Subroutine Prototypes Subroutines can be defined with a prototype. A series of specifiers which restrict the type and number of arguments. sub add_two_param ( $$ ) { return( $_[0] + $_[1] ); }
The prototype is the ( $$ ) part. This restricts the arguments to be two scalars. But note - if you pass an array then the array context will be coerced to scalars - i.e. the two scalars will be the lengths of the arrays. See perlsub man pages.
Notes:
How Do I … Access Subroutine Arguments You have written a function and want to access the arguments passed by its caller. sub hypotenuse { return sqrt( ($_[0] ** 2) + ($_[1] ** 2) ); }
Solution
$diag = hypotenuse(3,4);
Invoke like this
# $diag is 5
sub hypotenuse { my ($side1, $side2) = @_; return sqrt( ($side1 ** 2) + ($side1 ** 2) ); }
Better version with private variables
More info: See The Perl Cookbook, section 10.1 Page 335. All values passed as arguments are in the special array @_. So the first argument is in @_[0] and so on. The number of arguments is scalar(@_). Subroutines should always start by copying the arguments into a new private array. To return a value from a subroutine use the return function. If there is no return statement, then the value returned by the subroutine is the value of the last statement executed by the subroutine.
How Do I … Make Variables Private To A Function You want to use temporary variables in your function. sub somefunc { my $variable; my ($another, @an_array, %a_hash); # ... }
Solution: Use my to declare variables private to the subroutine.
my ($name, $age) = @ARGV; my $start = fetch_time();
You can combine my variables with an assignment
my ($a, $b) = @pair; my $c = fetch_time(); sub check_x { my $x = $_[0]; my $y = "whatever"; run_check(); if ($condition) { print "got $x\n"; } }
Declare some variables $x and $y private to this function
run_check() can’t see $x or $y
However, check_x can see $a, $b and $c since they are defined in the same scope
More info: See The Perl Cookbook, section 10.2 Page 337. $variable is only visible and accessible within the function somefunc(). When you declare many private variables you must do so inside a list, like this: my ($another, @an_array, %a_hash); Variables declared with my have lexical scope, which means that they only exist within a certain textual area of your code. Such a variable is destroyed when the body of code is ended. Usually the body of code is a block with braces around it like this: { # Your Code Here } Since a lexical scope is usually a block you will often hear the phrase lexical variables being only visible within their block.
How Do I … Create Persistent Private Variables You want a variable to retain its value between calls to a subroutine but not to be visible outside that subroutine. { my $variable; sub mysub { # ... accessing $variable } }
Solution: Wrap the function in another block and declare my variables in the blocks scope rather then the functions.
BEGIN { my $variable = 1; # initial value sub othersub { # ... accessing $variable } }
Use a BEGIN block if you need to perform initialisation
{
By default the initial value in $counter is undef, which is treated as zero the first time next_counter() is called
my $counter; sub next_counter { return ++$counter } } BEGIN { my $counter = 42; sub next_counter { return ++$counter } sub prev_counter { return --$counter } }
Do this to initialise to anything other than 0
More info: See The Perl Cookbook, section 10.3 Page 339. Lexical variables don’t need to vanish when their scope ends. If something more permanent is still aware of the lexical then it will be maintained. (Perl does this by reference counting).
How Do I … Detect Return Context You want to return a value that depends upon the calling context. if (wantarray()) { print "In list context\n"; return @many_things; } elsif (defined wantarray()) { print "In scalar context\n"; return $one_thing; } else { print "In void context\n"; return; # nothing } mysub(); $a = mysub(); if (mysub()) { @a = mysub(); print mysub();
Solution: Use wantarray()
# void context
}
# scalar context # scalar context # list context # list context
More info: See The Perl Cookbook, section 10.6 Page 344. Solution: Use wantarry() which returns one of three things depending on how the function was called. A function can decide what context it was called in and then return something which is appropriate to that context. List context is indicated by a true return value. Scalar context is indicated by a false return value which is defined. Void context is indicated by a undef return value.
References Two kinds of references: Hard (real - a bit like pointers in C, C++). Symbolic (use the name of one thing to access some other thing). Allows a variable or a subroutine to be accessed indirectly. A reference is not a variable - it’s a means of accessing a variable. To create a reference we use the \ operator. This takes an ordinary variable and returns a reference to it, like this: $ref_to_scalar $ref_to_array $ref_to_hash $ref_to_sub
= = = =
\$my_scalar; \@my_array; \%my_hash; \&my_sub;
We are going to discuss hard references here and symbolic references (only in passing) at the end of this section. When we say references we will always mean a hard reference. Once we have a reference, we can get at the thing it refers to by prefixing the reference (optionally in { and }) with the appropriate symbol. To refer to $my_scalar we write one of these: ${\$my_scalar}; $$ref_to_scalar; ${$ref_to_scalar}; So we can access @my_array like this: @{\@my_array}; @$ref_to_array; @{$ref_to_array}; and so on. If you prefix a reference by the wrong symbol then you’ll get an error.
References Accessing the elements of an array or hash through a reference: $a = ${ $hash_ref }{ “first” }; ${$array_ref}[0] = $h{ “first” };
The arrow operator takes a reference on its left and either an array index in [] or a hash key in {} on its right. It locates the array or hash that the reference refers to and then access the appropriate element.
References And The ref() Function If $reference contains:
Then ref( $reference ) returns:
A scalar value
undef
A reference to a scalar
“SCALAR”
A reference to an array
“ARRAY”
A reference to a hash
“HASH”
A reference to a subroutine
“CODE”
A reference to a filehandle
“IO” or “IO:Handle”
A reference to a typeglob
“GLOB”
A reference to a precompiled pattern
“Regexp”
A reference to another reference
“REF”
Object references are missing from the above list because the thing a reference to an object will return is the name of the object. This, of course, changes as you use different objects.
Because dereferencing a reference with the wrong prefix can cause errors it’s sometimes necessary to be able to figure out what kind of referent a specific reference is referring to. The built-in ref() function takes a scalar value and returns a description of the kind of reference it contains. If a reference is used where a string is expected then the ref function is called automatically to produce a string and a unique hex address representing the internal memory address of the referent is appended. This means that printing out a reference usually produces something like: HASH(0x10027588) If you use the ref() function on an object, this will be returned: my $graphics_object = Polygon->new( 0 0 5 5 10 32 70 10 12 18 ); # Polygon coordinates print ref( $graphics_object ); # Will print “Polygon”
The first example doesn’t work because of list flattening. So we need to use references to solve this problem. Each element in a Perl array can store a scalar, and a reference is a scalar (albeit a special kind of scalar). The bottom half of the slide shows how to set this up using references. The elements of the rows can be accessed using the arrow -> notation. $table->[1]->[2]; This means: find the array referred to by the reference in $table (i.e. @cols) and then get the element at index 1. That element stores a reference (a reference to @row2), the get the element at index 2. What’s the result? This is a popular way of creating data structures so Perl provides some simple assistance. If we place the list values in [] instead of () we create a reference to a nameless (or anonymous) array. The array is automatically initialised to the specified values.
The bottom example is identical to the data structure we set up on the previous page except that all the internal arrays are anonymous - so you can’t access @cols or @rows. The only access to the array elements is via the reference to the overall table. As a final piece of help, in any expression like: print $table->[$x]->[$y]; Any arrow between a closing square or curly bracket and an opening square or curly bracket can be removed. So the above can be rewritten like this: print $table->[1][2]; which is much neater.
Like the [] array constructor the {} hash constructor creates a reference which must be assigned to a scalar variable ($association), not to a hash (%association). Like the array reference, the values in the hash are only accessible via the hash reference: print $association->{ cat }; You can even nest hashes as well. Just like arrays, any -> between } and { can be omitted.
How Do I … Return More Than One Array Or Hash You want to return more than one array or one hash. ($array_ref, $hash_ref) = somefunc();
Solution: Return references to the hashes or arrays
sub somefunc { my @array; my %hash; # ... return ( \@array, \%hash ); }
sub fn { ..... return (\%a, \%b, \%c); # or return \(%a, %b, %c); # same thing }
More info: See The Perl Cookbook, section 10.9 Page 347. Just as all lists are flattened when multiple lists are passed to a function, the same happens with lists returned from functions with the return statement. Therefore to maintain the integrity of the arrays and hashes which are returned from a function, the arrays and hashes must be returned as references.
Creating Data Structures Suppose you write this as the first line of your program: $sue{ children }->[1]->{ age } = 10;
That’s pretty minimalist (and neat).
Perl creates a hash called %sue, gives it a new hash element indexed by the string children, points that to a newly allocated array whose second entry is made to refer to a newly allocated hash which gets and entry indexed by the string age.
References To Subroutines Anonymous subroutines can be created like this: sub { print “Hello $_[0]\n”; }
The above is useless since there’s no way to execute the subroutine, so do this: $sub_ref = sub { print “Hello $_[0]\n”; };
We can then call this: $sub_ref->( “Steve”; )
Notes: The “;” at the end of the second example is required since the whole line is a statement. The third example executes the code in the subroutine reference. We need to pass a parameter to the subroutine and this is done by enclosing it between “(” and “)”.
Passing Subroutine Arguments As References sub mysub { # Arrays are references, counts are scalars
Might be useful to prefix references with ref_
my ( $array1 , $count1 , $array2 , $count2 ) = @_; my $item1 = $array1->[ $count1 ]; my $item2 = $array2->[ $count2 ]; #
Suppose $item1 = 15 and $item2 = 36
return( $item1 , $item2 ); } # Call the above like this (assumes arrays and counts already set up) my ( $r1 , $r2 ) = mysub( \@array1 , $count1 , \@array2 , $count2 ); print $r1 , $r2; # prints 15 and 36
References provide a way of passing unflattened arrays or hashes to a subroutine (remember that when we pass more than one array to a subroutine their identity is lost because of array flattening). In this code we are expecting four parameters to be passed to mysub, two arrays, and two scalars which will be interpreted as an index into those arrays. The arrays are passed by reference, the scalars by value. Note that we can return more than one value from a subroutine - in this case we return 2.
Returning Subroutine Results As References sub make_random_list { # Counts are scalars my ( $count1 , $count2 ) = @_; my @new_array = (); foreach my $index ( $count1 .. $count2 ) { $new_array[ $index ] = rand(); } return( @new_array );
This is an example of how not to do it. What do you think is wrong with this?
} # Call the above like this: my @big_random_array = make_random_list( 42 , 14826504 ); # Do stuff with big_random_array print $big_random_array[ 137 ];
Subroutines can return references as well as receiving them. This example shows a subroutine which generates a large list of random numbers and then copies that list back the the code which called the subroutine. As shown above the list is copied back by value, I.e. a big copy of the list is passed back to the calling code as a large array. This means that in the program code there exists: 1 copy of the array in the subroutine, and once the subroutine ends and the array @new_array goes out of scope, that array is destroyed by Perl. 1 copy of the array is brought into existence in the main program as the end of subroutine is reached and each of the internal values in new_array is copied back into big_random_array. TINTWTDI.
Returning Subroutine Results As References sub make_random_list { # Counts are scalars my ( $count1 , $count2 ) = @_; my @new_array = (); foreach my $index ( $count1 .. $count2 ) { $new_array[ $index ] = rand(); } return( \@new_array );
This is an example of how to do it. What do you think is wrong with this?
} # Call the above like this: my $big_random_array = make_random_list( 42 , 14826504 ); # Do stuff with big_random_array print $big_random_array->[ 137 ];
In this code there is only ever one copy of the list - and it’s the one defined in the subroutine. When the subroutine ends and returns a reference to the list, normally Perl would arrange for the list to be destroyed (since it’s local to the subroutine and it’s about to go out of scope). However, since the subroutine is passing back a reference to an array, Perl arranges for the array to remain in existence. Only if the reference to the array is ever made to cease to exist, will Perl then delete the array which was defined inside the subroutine. Perl does this using a mechanism called reference counting. Basically it means that all Perl’s garbage collection is done for you. If you wanted to force Perl to delete the array inside the subroutine (to save on memory, say) then all you need to do is to; undef $big_random_array; Perl will reduce the reference count on the variable, and if it is zero then the array created by the subroutine will be deleted. Also, since only one thing (a scalar which is a reference) is passed back from the subroutine to the calling code, it’s very quick and efficient.
Sets $bam Sets the first element of @bam Sets the X element of %bam to Y Clears @bam Yields the keys of %bam Calls &bam
With symbolic references Perl is using the value of one variable as the name of another variable. This can be error prone and confusing, so I tend not to use this type of reference. You can force Perl to make all of the above examples into errors by using: use strict; Which I would recommend. If you then have a desperate need to use a symbolic reference for a while you can then always countermand the stricture with: no strict ‘refs’;
Packages sub call { ( $sub_ref , @args ) = @_; $sub_ref->( @args ); } package phone; sub call { if ( dial() ) { talk(); } }
This defines three completely distinct subroutines named call. The first is in the main namespace. The second is in the phone namespace. The third is in the poker namespace. If we do this, which call are we calling?
package poker; sub call { $pot = 21; deal(); }
package main; call( $ref , @args );
We would all like to use popular variable names like $count, $filename, $I. If we did this there wouldn’t be any way to use other peoples code, since they would have used the same variable names. Perl solves this problem by assigning each named variable and each named subroutine to a particular family, known as a package. Each package maintains its own symbol table or namespace. So two different packages may each have different variables and subroutines with identical names in their own namespace. By default Perl assumes that code is written in the namespace of the main package (which is called, appropriately enough, “main”). You can change that default by using the package keyword. A package declaration changes the namespace until another package declaration is made or until the end of the current enclosing block, eval, subroutine or file. See example: The example defines three subroutines called “call” in three different packages. The first, since it isn’t explicitly named is the main package. If we wanted to call one of the other subroutines called call, we could either switch to the package or we can call the subroutine version explicitly by prefixing the subroutine name by the package name like this: poker::call();
Package Variables Perl variables come in two flavours: package html; Package variables. $i = 56; Lexical variables. Package variables belong to a particular package. These are the standard, no-preparation-necessary, instant variables we all use most of the time. for ( $i = 0 ; $i < 100 ; $i++ ) { print “$i\n”; }
$i is created when it is referenced and it exists until goes out of scope, in this case the end of the program since it isn’t a lexical variable - it belongs to the current package. We can force the use of a variable in another package by prefixing the name of the variable with the name of the package followed by a ::
Lexical Variables Lexical variables: Lexical variables are declared explicitly with the keyword my. package main; my $i;
A lexical variable
for ( $i = 0 ; $i < 100 ; $i++ ) { my $time = localtime(); print “$i at time=$time\n”; }
A lexical variable
Lexical variables differ from package variables in three ways: 1 They don’t belong to any package, so you can’t prefix them with a package name. 2 They can only be accessed within the physical boundaries of the code block or file scope in which they are declared. In the code shown, the variable $time is only accessible to code physically located in the for loop and not to code appearing before of after the loop. 3 They usually cease to exist each time the program leaves the code block in which they were declared. In the example the variable $time ceases to exist at the end of each iteration of the for loop (it is recreated at the beginning of each iteration of the loop).
Modules Modules are the re-use part of Perl. A Perl module is a text file with a suffix .pm containing some Perl code. It’s placed in a “standard” place. You can add to the “standard” places with a use lib; statement. When the compiler encounters a use statement in a program it searches through the standard directories, locates the file, and loads the code. Modules come in two flavours: Traditional - Interface available by exporting symbols. Object Oriented - Interface available by method calls. When you have created a module you can control what is visible to a user with the Exporter() module. See the example at the end of this section.
The easiest way to see how to use modules is by example. An example of exporting a module interface with symbols follows on the next slide. An example of exporting a modules interface with method calls will be shown when we come to Object Oriented Perl. (Generally Object oriented modules export nothing, since the whole idea of methods is that Perl finds them for you automatically based on the type of the object).
An Example Of Building A Module To build a module called Bestiary, create a file called Bestiary.pm that looks like this: package require our our our our
Bestiary; Exporter;
@ISA @EXPORT @EXPORT_OK $VERSION
= = = =
qw(Exporter); qw(camel); # Symbols to be exported by default qw($weight); # Symbols to be exported on request 1.00; # Version number
### Include your variables and functions here sub camel { print "One-hump dromedary" } $weight = 1024; 1;
This is very important
In the example a program can now do this: use Bestiary; to be able to access the camel function (but not the weight variable), and: use Bestiary qw( camel $weight ); to access both the function and the variable. When you use a module, the module usually makes some variables or functions available to your program - some symbols are exported from your module. Most modules use Exporter to do this. When modules are loaded they must return a TRUE value to indicate that the loading was successful. This is usually represented by retuning the TRUE value as shown on the last line of the example.
An Example Of Building A Module require Exporter; our @ISA = ("Exporter");
These two lines make the module inherit from the Exporter class (described in object-oriented Perl). Bestiary can now export symbols into other packages with lines like this.
our @EXPORT = qw($camel %wolf ram); # Export by default our @EXPORT_OK = qw(leopard @llama $emu); # Export by request our %EXPORT_TAGS = ( # Export as group camelids => [qw($camel @llama)], critters => [qw(ram $camel %wolf)], );
@EXPORT symbols nothing the ram function and @llama array $camel and @llama @EXPORT symbols $camel, @llama, and ram all scalars the critters, but exclude ram
You can include any of these statements to import symbols from the Bestiary module.
critters, but no camelids
The first two line make the module inherit from the Exporter class. The second set of lines tells Bestiary what it is allowed to export into classes which use it. The third set of lines can all be used in any program which uses Bestiary to determine what is and what is not imported into the current package. Leaving a symbol off the export lists does not render that symbol inaccessible to the program using the module. The program will always be able to access the contents of the modules package by fully qualifying the package name, like this: $Bestiary::number_of_lambs;
POD, Special Variables, Internal Perl Functions Command Line Switches, Perl One-liners
Notes:
POD Perl supports a simple mark-up langauage called POD Plain Old Documentation. You can embed POD in any sort of file - including Perl scripts/programs. Perl simply skips over the POD when compiling. The Perl lexer starts skipping when it sees an = sign and an identifier.
=head1 Here There Be Dragons! All of the text from here until the lexer sees =cut, will be ignored.
=item snazzle The snazzle() function will behave in the most spectacular form possible =cut sub snazzle { my $arg = shift; .... }
If you ever download CPAN modules you’ll find that a lot of them have POD documentation included within the code. This is confusing at first until you realise that the compiler just skips over all the POD. Perl ships with tools to convert files containing POD into various printable file formats: pod2text File.pm | more pod2man File.pm | nroff -man | more Or pod2man File.pm | troff -man -Tps -t > tmppage.ps ghostview tmppage.ps Pod2html File.pm > tmppage.html For a complete overview of POD see Chapter 26 of Programming Perl 3rd edition. Look at Mosfet.pm in the Examples/OO_Code area. Also see Mosfet.pod_text, Mosfet.man, Mosfet.postscript and Mosfet.html in the same area.
Some Special Variables use English;
Short name What it does
@ARG
@_
Argument list passed to subroutine
$ARG
$_
Default input and search pattern Hash containing your current environment variables
%ENV $LIST_SEPARATOR
$"
Defaults to a space
$MATCH
$&
The string matched in the last successful pattern
$POSTMATCH
$’
The string following what was last matched
$PREMATCH
$`
The string preceding what was last matched
$ERROR
$!
Current value of last system call
STDERR
Special filehandle for standard error in any package
STDIN
Special filehandle for standard input in any package
STDOUT
Special filehandle for standard output in any package
This is not an exhaustive list - see Chapter 28 of Programming Perl, 3rd edition. Items without a short name don’t need the use English; pragma.
Some Perl Functions (By Category) Flow of program control: continue, die, eval, exit, goto, last, next, redo, return, sub, wantarray. Miscellaneous: defined, eval, scalar, undef. Process and process groups: alarm, exec, fork, kill, pipe, setpriority, sleep, system, wait, waitpid. Library modules: import, package, require, use. Classes and objects: bless, package, ref, use. Time: gmtime, localtime, time.
There are also extensive categories for: 1. Low-level socket access. 2. Inter-process communication. 3. Fetching user and group information. 4. Fetching network information.
Examples Of chop() And chomp() Remember, chop is indiscriminate, it always removes something, so you’re supposed to know that the last character on a line is “\n”.
while () { chomp; # avoid \n on last field @array = split /:/; ... }
chomp is more discriminating, it will only remove the last character if it’s a “\n”. You could also do s/\n$//; which is explicit.
You almost always want to use chomp() and not chop(). chop() always returns the character it removes. If you chop() a list, then every item in the list is chopped. The thing which ends up in $answer in the question on the slide is the character which was removed from the string $tmp. The thing you probably wanted was $tmp. chomp() is discriminating, and although by default it always removes the last character on a line only if that character is “\n”, the default can be overridden. The character (or string) which is removed is that contained in the Perl variable $/. So chomp() can remove any arbitrary length string from the end of an input string. chomp() returns the number of characters it deleted - not the characters themselves.
Examples Of hex() And oct() $number = hex("ffff12c0"); sprintf "%lx", $number; # (That's an ell, not a one.)
perl -e 'print 0xffdc;'
sprintf uses the same conventions as C’s sprintf.
A neat command line alternative when you need a quick conversion.
$val = oct $val if $val =~ /^0/;
Does $val start with an “0” (as opposed to “0x” or “0b”).
Note that you can always set the value of any variable with a hex value just by doing this: $h_number = 0xffdd; print $h; The hex() function is interpreting a string as a hex number, not a value. If the string begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as shown. Hex strings can only represent integers. Strings which would cause integer overflow will trigger a warning. oct() will interpret a string as an octal value. If the string starts with “0” it will be interpreted as octal. If the string starts with “0x” it will be interpreted as a hex value. If it begins with “0b” it will be interpreted as a binary value. Try this: perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad enough to know from what 80’s/90’s TV series this was an episode title.
Examples Of sprintf() Field
Meaning
%%
A percent sign
%c
A character with the given number
%s
A string
%d
A signed integer, in decimal
%u
An unsigned integer, in decimal
%o
An unsigned integer, in octal
%x
An unsigned integer, in hexadecimal
%e
A floating-point number, in scientific notation
%f
A floating-point number, in fixed decimal notation.
%g
A floating-point number, in %e or %f notation
See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Be careful - sprintf() in Perl does its own formatting - it is NOT calling the underlying sprintf() function in the C library.
Examples Of sprintf() Field
Meaning
%X
Like %x, but using uppercase characters
%E
Like %e, but using uppercase “E”
%G
Like %g, but using uppercase “E” if applicable
%b
An unsigned integer, in binary
%p
A pointer (the Perl value’s address in hexadecimal)
%n
A special: stores the number of characters output so far into the next variable in the argument list.
In addition to the formats on the previous slide, Perl also supports the following conversions. For compatibility, Perl also supports these conversions: %I - a synonym for %d %D - a synonym for %ld %U - a synonym for %lu %O - a synonym for %lo %F - a synonym for %f
Examples Of sprintf() Flag
Meaning
space
Prefix positive number with a space
+
Prefix positive number with a plus sign
-
Left-justify within field
0
Use zeroes, not spaces, to right-justify
#
Prefix non-zero octal with “0”, non-zero hex with “0x”
number
Minimum field width
.number
“Precision”: digits after the decimal point for floating-point numbers, maximum length for a string, minimum length for an integer.
l
Interpret integer as a C type long or unsigned long
h
Interpret integer as C type short or unsigned short (if no flags are supplied interpret integer as C type int or unsigned
See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Perl allows the following flags between the % and the conversion character.
Examples Of split() @chars @fields @words @lines
= = = =
split split split split
//, /:/, " ", /^/,
$word; $line; $paragraph; $buffer;
Question: What does this produce?
print join ':', split / */, 'hi there';
($login, $passwd, $remainder) = split /:/, $_, 3;
split /([-,])/, "1-10,20";
# Produces the list (1, '-', 10, ',', 20);
split /(-)|(,)/, "1-10,20"; # Produces the list (1, '-', undef, 10, undef, ',', 20)
$string = join(' ', split(' ', $string));
Syntax: Split /PATTERN/ , EXPR , LIMIT split /PATTERN/ , EXPR split /PATTERN/ split split() scans a string and splits the string into lots of sub-strings, returning the resulting list in list context, or the count of sub-strings in scalar context. The separator is determined by pattern matching using the regular expression given as part of the split() function - so the separators need not be the same size and need not be the same string, on every match. Normally the separators are not returned (but if the pattern contains () then the substring matched by each pair of () IS included in the resulting list, interspersed with the fields which are normally returned). If more than one pair of () is used then one substring is returned for each pair (some may be undef, so be careful). If the pattern doesn’t match at all then split() returns the original string. If a limit is supplied then Perl will not return more than that number of sub-strings. If no sting is supplied then Perl uses “$_”. If no pattern is supplied or is the literal space “ “, then the function splits on whitespace, /\s+/, after skipping any leading whitespace.
while (<>) { foreach $word (split) { $count{$word}++; } }
Both examples make use of defaults. In both cases the input text is extracted with the <> operator and thus the splitting occurs on “$_”. In the second case split() is passed no string (so it uses “$_”) and no pattern (so it strips all leading whitespace and then splits on whitespace).
Examples Of stat() And unlink() ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat $filename;
if (-x $file and ($d) = stat(_) and $d < 0) { print "$file is executable NFS file\n"; }
$mode = (stat($filename))[2]; printf "Permissions are %04o\n", $mode & 07777;
use File::stat; $sb = stat($filename); printf "File is %s, size is %s, perm %04o, mtime %s\n", $filename, $sb->size, $sb->mode & 07777, scalar localtime $sb->mtime;
The stat() function returns a 13 element list giving statistics for a file. If a file stat isn’t supported on a particular file system then the corresponding entry will be zero. See page 801 of “Programming Perl, 3rd edition” for more details. The File::stat module provides a convenient, by-name access mechanism. The unlink() function is used to delete a list of files. The function returns the number of files which were successfully deleted. BE CAREFUL - this is ‘rm’ in disguise.
gmtime And localtime # 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime; $london_month = (qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];
All elements of the lists returned by gmtime() and localtime() are numeric, so January is month 0, Sunday is day 0. $year is the number of years since 1900.
system And exec And `` @args = ("command", "arg1", "arg2"); system(@args) == 0 or die "system @args failed: $?” # If the program succeeds - then life goes on @args = ("command", "arg1", "arg2"); exec(@args); # You will never get here.
my $current_directory = ’pwd’;
This example uses backticks to capture the output of the “pwd” command. Why is this a bad example?
The system() and exec() functions execute any program on your system for you and return that programs exit status - not the programs output. To capture the output from a program you must use backticks or qx//. The difference between the two functions is that system() will fo a fork first and then wait for the executed program to finish. That is, it runs your program for you and returns when it is done. Exec() replaces your running program with the the new one, so it never returns if the replacement succeeds (which makes the return of the exit status a bit redundant). See “Programming Perl”, 3rd Edition, page 811, for more details. In the last example on the slide we use backticks to figure out what our current directory is. This is an example of how you can capture the output of an external program - a bad example, because what will happen if you put this script on your web-page, someone downloads it and then they find out it doesn’t run because their system doesn’t have a pwd command.
Command Line Switches And Writing Perl One-Liners The -e switch allows you to write scripts directly on the command line. perl -e ’print “Hello World\n”;’
Perl programs can receive arguments from either: Standard input.. cat myfile | perl -e ’while(<>){ print unless /^\s+#/; }’
Perl one-liners fit the whole of a Perl program onto one line (a command line). See the accompanying article in the second edition of the Perl Review (contained as a .pdf file in the Examples directory). Also see the whole of Chapter 19 of “Programming Perl”, 3rd edition, Pages 486-503 inclusive. The first example is something you’ve already seen. In the second example the pipe operator | takes the output of cat and makes it the standard input to the Perl program. The diamond operator <> takes lines from standard input, so this example prints the contents of the file “myfile” and executes the pattern match shown (which throws away all comments - as long as comments start with a #). The third example does the same as the second but uses the file redirection operator (<). The fourth example uses the fact that the diamond operator can also open and redirect the contents of a file specified on a command line. So this example is exactly equivalent to both examples 2 and 3. The last example prints out this: alpha.doc beta.txt gamma.eps
Perl Command Line Switches Useful For One-Liners Switch
Effect
-e
Used to enter one or more lines of a script.
-i
Specifies that files processed by <> are to be edited in place.
-iEXTENSION
Specifies that files processed by <> are to be edited in place
-mMODULE
Loads MODULE as if you had executed a use.
-n
Causes Perl to assume a loop around your code which makes it iterate over filename arguments. See Example.
-p
Causes Perl to assume a loop around your code which makes it iterate over filename arguments. See Example.
Use the -I option with care. It renames the input file, opens and output file with the original name and then selects that output file for all print, printf and write statements. If you use only the -I option then NO BACKUP COPY OF YOUR ORIGINAL FILE IS MADE. The original file will be overwritten. If you do specify EXTENSION then the original file is backed up using extension to supply a new name. Here’s an example: perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz # Note that the -p option has not yet been discussed. This will load the file called xyz, rename a backup copy to xyz.orig, open a new version of xyz for output and run the substitution on the original file contents, placing the result of the substitutions into the new file (still called xyz).
An Example Of A Perl One-Liner #!/usr/bin/perl $extension = '.orig'; LINE: while (<>) { if ($ARGV ne $oldargv) { if ($extension !~ /\*/) { $backup = $ARGV . $extension; } else { ($backup = $extension) =~ s/\*/$ARGV/g; } unless (rename($ARGV, $backup)) { warn "cannot rename $ARGV to $backup: $!\n"; close ARGV; next; } open(ARGVOUT, ">$ARGV"); select(ARGVOUT); $oldargv = $ARGV; } s/foo/bar/; } continue { print; # this prints to original filename } select(STDOUT);
This,
Does exactly the same as this.
perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz
The example from the previous slide is expanded here as the minimum needed to replace the functionality of the one-liner.
The Perl -n And -p Command Line Switches The -n switch causes Perl to assume the following loop around your script, which makes it iterate over the filename arguments much as sed -n or awk do. LINE: while (<>) { ... # your script goes here }
The -p switch causes Perl to assume the following loop around your script, which makes it iterate over the filename arguments much as sed does. LINE: while (<>) { ... # your script goes here } continue { print or die "-p destination: $!\n"; }
In both cases you can use LINE as a loop label from within your script, even though you can’t actually see it in your file. With the -n switch, lines are not printed by default. With the -p switch, lines are printed automatically. In both cases BEGIN and END blocks may be used to capture control before or after the implicit loop - just like awk.
Other Perl Command Line Switches Switch
Effect
-c
Causes Perl to check the syntax of the script and then exit without executing what has just been compiled.
-d
Runs the script under control of the Perl debugger.
-h
Prints a summary of Perl’s command line options.
-T
Turns on “taint” checks - an extra form of security useful for running CGI scripts.
-v
Prints the version number and patch level of the Perl executable.
-w
Prints warnings about variables which are used only once, and variables which are used before being set. See Chapter 33 of “Programming Perl” 3rd edition.
We will discuss the perl debugger later. Everyone should always run Perl with the -w option, either as here, as part of the command line, or more generally as part of the: #!/usr/local/bin/perl -w There are many more command line switches than those listed. See the whole of Chapter 19 of “Programming Perl”, 3rd edition for a complete description.
Command Line Arguments etc. Item
Description
ARGV
The special filehandle that iterates over command line filenames in @ARGV.
$ARGV
Contains the name of the current file when reading from the ARGV handle using <>.
@ARGV
The array containing the command-line arguments intended for the script. $#ARGV is the number of arguments minus one. $ARGV[0] is the first argument, not the command name. Use scalar @ARGV for the number of program arguments.
@ARG
Within a subroutine, this array holds the argument list passed to that subroutine.
@_
Within a subroutine, this array holds the argument list passed to that subroutine.
Notes:
Adding Command Line Arguments To Your Own Programs There are two options: Use the CPAN getopts module. Write your own code - like this: sub Process_Command_Line_Arguments { my ( $ref_arguments ) = @_; my $numargs = @$ref_arguments; # Process all arguments my $next_arg; while ( $numargs-- ) { $next_arg = shift( @ARGV ); SWITCH: { if ( $next_arg =~ m/^\-i/i if ( $next_arg =~ m/^\-o/i if ( $next_arg =~ m/^\-d/i if ( $next_arg =~ m/^\-/i } } return TRUE;
) ) ) )
{ { { {
$main::infile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; } $main::outfile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; } $main::debug = TRUE; last SWITCH; } croak( "Unknown command line switch $next_arg" ); }
}
Note that the input arguments are via a reference. You should also include some code to look for something like -h or -help, print out something useful and then exit the program.
Conclusion You’ve seen a lot in a short time. The key points of Perl are that: Variables consist of scalars and collections of scalars (arrays & hashes). A lot of the control structures are similar to C etc. References and subroutines. Packages and Modules. Pattern matching is very powerful. Perl is a very versatile language. You all now know enough to write useful Perl programs.
Notes: Now give the advanced material in Style, then run LAB7 MODULES_AND_SUBROUTINES_1
Style Guidelines For Perl 1
Introduction
This document presents guidelines for anyone who writes Perl scripts for design support tasks. The aim is to introduce a common style and understanding for the benefit of anyone who either writes new programs, or has to debug and/or maintain old ones.
2
Program Structure
Structure your program in the same way you would structure a C program. Have one section of code that is the equivalent of C’s main(), and as long as the total program size is anything other than trivially small, put code into subroutines that are called from the main program body. Don’t structure the top-level of a program in file-scope since any variables declared there are visible in all following subroutines (even if they’re lexical, or my, variables)– instead create the top-level of your program as a code block (if you to think in C terms, even label it as MAIN if this helps you) and put all code there. Also, don’t use global variables at all (i.e., outside the code block), since this allows variables to have side-effects in different subroutines. To achieve both of these features structure your code like this: #!/usr/local/bin/perl use strict; use warnings; use diagnostics; sub subroutine_1( $$$ ); MAIN: { my $variable_1 = 27; # Program code – equivalent of C’s main() } exit; sub subroutine_1( $$$ ) { # Subroutine body – can’t see $variable_1 unless it was passed as an # parameter in the subroutine call. }
use strict and use warnings are never optional, while use diagnostics gives readable
error messages that are useful for new users (and old ones).
The loop with the label MAIN: is where the main body of the program is written. A code block like this is the equivalent of a loop that runs exactly once, but has the feature that all the lexical variables declared within its scope are restricted to that scope, i.e., subroutine_1 can’t see the values of any lexical variables like $variable_1 unless they are passed to subroutine_1 as an argument of a subroutine call to subroutine_1 (which is basically how you’d hope a program would behave). Also note that the label (MAIN:) is optional, and can be omitted. Note that subroutine_1 is declared before the main body of the program. This is only needed if the subroutine definitions follow the main program – if they precede it then the forward declarations aren’t needed since the declaration is also the definition. Also note that subroutines can optionally be declared with prototypes (the $$$ in ( $$$ ) which here declares that the subroutine is expecting three scalar arguments). This check is performed at compile time so there’s no run-time overhead for doing this. If you must use a global variable (you really shouldn’t) then make it explicit that this is what you’re doing by referring to it as a package variable like this: #!/usr/local/bin/perl sub subroutine_1(); $main::count = 56; MAIN: { $main::count = 27; subroutine_1(); } exit; sub subroutine_1() { print “The value of count is $main::count\n”; }
Here we’ve declared a global variable called $main::count (it’s a variable named $count in package main, the default package name, which is why it’s name is $main::count). This code prints the value 27 when executed since the initial value of 56 is overwritten in the main body of code and this is the value seen in subroutine_1 when it is executed. Note that the value of $main::count wasn’t passed to subroutine_1 as a parameter, but subroutine_1 can still see its value (it can change its value as well – this is what I mean by having a side-effect).
2.1 Should Subroutine Parameters Be Passed By Value Or Reference ? If you don’t want parameters passed in subroutines to be changed by the subroutine, then pass parameters by value. This is nearly always what you want. To do this copy all the parameters to the subroutine into lexical variables at the start of the subroutine like this:
2 / 18
July 31, 2005
subroutine_1( $$$ ) { my ( $var_1 , $var_2 , $var_3 ) = @_; # Subroutine code goes here. $var_1 etc are private to this code }
This is a common Perl idiom where all the variables from the @_ array are copied into lexical variables in the subroutine. This makes those variables local to the subroutine – changing them in the subroutine will NOT change them in the calling code. This is normally how you would expect programs to behave. If you do want a variable in a subroutine to be changed in the calling code then pass the variables to the subroutine by reference instead. This is done like this: MAIN: { my $a = 56; subroutine_1( $a ); print “A=$a\n”; } exit; subroutine_1( $ ) { $_[0] = 99; # Alter the first element of the @_ array }
The elements of the @_ array are references to the variables in the calling code, so changing the value of $_[0] will change the variable $a in the example above. Therefore the value printed will be A=99. This form is not recommended since it’s confusing and inconsistent with normal usage.
2.2 Passing Arrays And Hashes To Subroutines Lists of values when passed to subroutines are flattened, so if you pass two lists to a subroutine, from the perspective of the subroutine itself this looks like one long list, i.e., the identity of the two lists is lost. Since this almost certainly isn’t what you want to achieve, pass the lists as references instead. This way the identities of the two (or more) lists is maintained. Here’s how to do this: MAIN: { my @list_1 = qw( Alpha Baker Charlie Delta ); my @list_2 = qw( Zulu Yankee Xray Whisky ); subroutine_1( \@list_1 , \@list_2 ); }
The two arguments (which are themselves scalars) are references to the original lists so the subroutine can access the individual elements of the lists. Therefore the above example prints out “Baker Whisky”.
2.3 Returning One Or More Results From A Subroutine : Part 1 Use the wantarray function to see if a subroutine was called in scalar or list context. If the wantarray function returns TRUE then return a list, else return a scalar. Here’s how to do this: MAIN: { my @list = subroutine_1(); my $scalar = subroutine_1(); print “@list $scalar”; } exit; subroutine_1( $$ ) { if ( wantarray ) { return qw( one two three four five ); } else { return( “once I caught a fish alive\n” ); } }
The first call to subroutine_1 is in list context (the calling program expects a list to be returned). In subroutine_1 the wantarray function is evaluated and for this first call it will be TRUE, therefore subroutine_1 sends back a list of five things (the textual representation of the numbers one to five inclusive). The second call to subroutine_1 is in scalar context (the calling program expects a single thing to be returned). Now when the wantarray function is evaluated a single thing is returned (a string consisting of the text “once I caught a fish alive”. Note that you can also return information from a subroutine that is expected to be interpreted as a hash. If this is true then you should make sure that you return an even number of scalars (each pair of scalar’s will be used as a key/value pair in the resulting hash).
4 / 18
July 31, 2005
2.4 Returning One Or More Results From A Subroutine : Part 2 You want to return several scalars from a subroutine. Here’s how to do it; MAIN: { my @values = qw ( 6.32 7.88 9.54 12.83 17.99 31.36 18.25 ); my ( $mean , $median , $mode , $variance ) = statistics( @values ); # Code to print out results } exit; sub statistics { # Code to compute mean, median, mode, variance return( $mean , $median , $mode , $variance ); }
We arrange for the subroutine to return four scalar variables in a list, and we arrange for the receiving code to place those four returning values in that list, into another four scalar variables.
2.5 Making The Equivalent Of C Static Variables Sometimes you want to be able to create a variable in a subroutine that will maintain its value between subroutine calls. Here’s how to do this: MAIN: { my $tmp; $tmp = count(); print “Tmp = $tmp\n”; $tmp = count(); print “Tmp = $tmp\n”; } exit; BEGIN { my $count_value = 0; sub count() { $count_value++; return $count_value; } }
Place the subroutine definition(s) in a code block (subroutines are visible from everywhere regardless of how you “hide” them). The lexical variable $count_value is locally scoped to the July 31, 2005
5 / 18
code block its defined in and is therefore available to the subroutine count(). However, while normally a lexical variable will be destroyed once a code block finishes execution, in this case the compiler arranges for it to continue to exist since something is still referring to it (in technical terms the subroutine count() has incremented $count_value’s reference count, and that stops Perl from destroying it). The only problem is how to get an initial value of zero into the value of $count_value. This is done by placing all the code in a BEGIN block. Perl guarantees to execute all BEGIN blocks as soon as they are compiled, thus ensuring that the single line of code “my $count_value = 0” is executed before any call to the subroutine is made. The above code therefore prints out Tmp = 1 followed by Tmp = 2. Of course, there’s no reason why several subroutines cannot share a variable in this way to provide a globally accessed variable that cannot suffer from unintended side-effects. Here’s how: MAIN: { my $tmp; initialize( 37 ); $tmp = increment(); print “Tmp = $tmp\n”; $tmp = decrement(); print “Tmp = $tmp\n”; } exit; BEGIN { my $value = 0; sub initialize( $ ) { $value = shift @_; } sub increment() { $value++; return $value; } sub decrement() { $value--; return $value; } }
This is a very secure way to create something that can be accessed from anywhere in a controlled and predictable manner. The variable $value is secure from any unintended side-effects (or even intended ones) and can be initialized/incremented/decremented from anywhere (you could of course also add a read subroutine to just return the value). We’ve almost strayed into OO land here since we’ve created something that is encapsulated (the variable value) and can only be accessed via subroutine calls (equivalent of OO methods).
6 / 18
July 31, 2005
2.6 Implementing A Switch Statement One way is to download switch.pm from CPAN and use that, but that might not be an option for code you export to other sites. Here’s another way that’s self-contained: SWITCH: { if ( $condition == TRUE) { # Run some code next SWITCH; } if ( $some_other_condition == TRUE) { # Run some other code last SWITCH; } # Run some default code }
Here, SWITCH is a label (so each switch statement needs a different label and this is a drawback) while the last SWITCH piece of code is the equivalent of C’s break. Since this is a loop you can repeat it with next (all clauses except the last) , and end it with last (the last clause only).
2.7 Labels : Use Them Use labels to be explicit about where the commands next and last transfer you (and goto, but you’re never going to use goto, are you!). OUTER: { foreach my $item ( @item_list ) { INNER: { foreach my $object ( @object_list ) { # Code next OUTER if ( $some_condition
== TRUE );
# Code next INNER if ( $some_other_condition == TRUE ); } } } }
July 31, 2005
7 / 18
2.8 Labels : Don’t Use Them If you use labels it is always clear where you are transferring control to, but it is never clear at the transfer point (i.e., the actual label) where transfer of control has come from, and this makes it very hard to debug code – next and last with labels are just synonyms for goto (and you’re never going to use goto, are you!) On balance, use labels for SWITCH and one level loop operations.
3
Writing Efficient, Maintainable And Reusable Code
Package useful code into subroutines and then into modules and then share it with everyone. Install tools in: /design/rmc/tools/
and modules in: /design/rmc/tools/Perl_Modules/tool/dev/
and in both cases release them. Don’t forget to write documentation, ideally as POD (Perl has translators to generate man pages, html and PDF). Don’t reinvent the wheel. Since a lot of what we do involves reading and parsing files, and then writing some new file(s), use Netlist_Tools.pm in the Perl_Modules directory. These routines are debugged and work quite happily with files that are gigabytes in size and they’ll transparently gunzip any files that are gzipped even if you don’t know they’re gzipped. Don’t reinvent the wheel. Also, before you write a mega-thingy widget that will revolutionize human-kind, look on CPAN just in case someone else has beaten you to it (they probably have)! Don’t reinvent the wheel. If you’re writing code that makes several different tests on some data, put the most common tests before the less common ones. For example, if you’re testing a string in a loop like this: foreach my $line ( @very_large_file ) { if ( $line =~ m/\s*\#/ ) # Lines that are comments (start with a #) { next; } if ( $line =~ m/^$/ ) # Lines that are blank { next; } if ( $line =~ m/^\s+/ ) # Lines that contain leading white-space { next; } if ( $line =~ m/^\S+/ ) # Lines without leading white-space { # Code to process $line next; } }
8 / 18
July 31, 2005
and you run this code with a file containing 10 million lines of which 99.99% of the lines are not either comments, blank or start with white-space, then you’ll end up executing approximately 40 million tests. If you put the bottom most test (the test for lines without leading white-space) first, then this code will now run and execute about 10 million tests.
3.1 Writing Readable Code Here’s a very good question. Why do I need to observe and adhere to standards in programming in an environment like ours? My answer to this is to give an example: foreach $keyName (keys(%keys)) { foreach $hierName (keys(%{$keys{$keyName}{instances}})) { if(${$instances{$hierName}{type}} eq "key") { my $cellName = ${$instances{expandExpression($keyName, $hierName)}{cellName}}; if(exists($cellProperties{$cellName}{classless}{keyTerminals})) { foreach my $keyTermName (split(',', ${$cellProperties{$cellName}{classless}{keyTerminals}})) { if(exists($keys{$keyName}{instances}{$hierName}{$keyTermName})) { my $netName = expandExpression($keyName, ${$keys{$keyName}{instances}{$hierName}{$keyTermName}}); if(defined($packageTerms{$netName})) { if(!defined($ios{$netName}) || $ios{$netName} ne "-global") { if($packageTerms{$netName} eq $keyName) { keysWarn("Instance of cell with duplicated key names, cell $cellName, in $padName, duplicated key name is $keyTermName\n"); } else { keysWarn("Two or more keys connected to package terminal $netName, key $keyName$keyTermName and key ", $packageTerms{$netName}, "\n"); } } } $packageTerms{$netName} = $keyName; } } } } } } keysMessage("Checked ".scalar(keys(%packageTerms))." package terminals\n");
I’ve rendered it in a small font size to illustrate a point: the formatting has been preserved exactly as it was written, and this is a small fragment of a much larger code-base of well over 5000 lines of code just like this. And my point? I absolutely guarantee to you that one week after the above code was written, that the original author will not know all the nuances that went into it’s authorship. Any debugging exercise will be very difficult for that author, let alone someone who comes fresh to the task with responsibility to maintain this code once the originator has moved on. Therefore, style and readability and clarity matter.
3.1.1 Hints For Readable Code Line up items so that it’s easy to spot errors. For example, this works but isn’t acceptable: my my my my my
If you’re writing a complex “if” statement then line up the brackets: If ( ( $day == SUNDAY ) && ( $full_moon == TRUE ) && { $spring_equinox == TRUE ) ) { print “It’s Easter Sunday\n”; }
Use a 2 or 4 column indent and be consistent in its usage. Put the opening curly brace on the line after a keyword and lined up with the start of the keyword. A one-line BLOCK may be put on one line, including left- and right-brace. If ( $flag == TRUE) { $result = PI; $next_example = FALSE; }
Don’t omit the semicolon in a one-line BLOCK even though you can (in the above example it’s the semicolon after the “E” in FALSE. At some point it’s a certainty that you’ll change that one line block to a multi-line block by adding new commands. At that point the semicolon is needed and you’ll have to add it anyway. Don’t put space before the semicolon after a statement. Do put space both before and after a “,” when separating parameters and list items. Put space around most (all) operators. Put space around complicated subscripting code. Put blank lines between sections of code that do different things. Don’t put space between a function name and its opening parenthesis. Break long lines after an operator. 10 / 18
July 31, 2005
Omit redundant punctuation as long as clarity doesn't suffer.
3.2 Use Constants If values appear in code that are constants, define them as constants with “use constant”. It is an accepted convention that constants should appear in all UPPERCASE. use constant PI => 3.1415926; use constant E => 2.7182818; use constant A => 6.02E23; MAIN: { my $radius = 2.0; my $area = PI * $radius * $radius; }
3.3 Make The Use Of References Obvious If your code uses references, make sure that the variable names that are used are tagged with something that makes it obvious they’re references, like _r. If you do this consistently it then becomes obvious when you try to use something that is/is not a reference in a dereference operation. For example, in the following code it’s obvious that you should only be using the dereference operator (the ->) on a reference. my $array_r = []; # Create a reference to an empty list # and then later $array_r->[ 56 ] = PI;
While in the following example it should be obvious that something has gone wrong because the dereference operator is not being used on a reference (the _r is missing). my $number = 56; # and then later $number->[ 0 ] = get_random_integer();
3.4 Don’t Use Default Values When using a loop construct like foreach, don’t use the defaults allowed by Perl. I.e. it is allowable to say this: foreach ( @l ) {
July 31, 2005
11 / 18
print $_; }
Which doesn’t tell you much about what’s going on and why, whereas the far more readable: foreach my $book_title ( @library ) { print “$book_title\n”; }
tells you exactly what was/is intended. This will be more clear to others when they read your code and will be clearer to you when you come back to debug your code in a years time.
3.5 Distinguish Between for And foreach The Perl keywords, for and foreach are synonyms, so you can use either one to index through lists or index through values. Here are two examples of how you should use them: foreach my $name ( @friends ) { print “I have a friend called $name\n”; } for ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }
And here are two examples of how you should not use them: for my $name ( @friends ) { print “I have a friend called $name\n”; } foreach ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }
3.6 Common Sense Use meaningful variable and subroutine names. Don’t use variables with the names $a and $b. See the man page for sort() to understand why. Name variables using my (i.e., use lexical variables). Never use global variables and don’t be tempted in the heat of debugging to insert just one or two to get around a problem. 12 / 18
July 31, 2005
Use lots of comments. You’ll be amazed how quickly you’ll forget just what it was you were trying to express in your code a day, a week, a month, a year ago. Document functions and procedures. When in doubt use parentheses. Just because you can omit them doesn’t mean you should omit them. If your program is running for more than a few seconds, give your users some feedback. If you’re programming a GUI in PerlTk, use a progress bar. If your program is a command line driven program then always program a -help parameter to give users some idea of what the program does and what to type. Make the invocation of the program with no parameters display some help information. Give a user the option to get more help with a –help parameter. Allow default options. Make sure a user knows what they are, when he/she asks for help. Make error messages clear so a user knows what to fix when things don’t run the way they expect. Since many programs are often chained together or are run within a single controlling program, make sure all scripts return an error or success code. Error codes for success are always 0 (zero). If programs are designed to be chained together in a shell script, then follow the Unix philosophy of having programs that complete successfully return no output at all (i.e., they are silent). Here’s a way of setting up and using exit codes: # Exit codes : use constant EXIT_OKAY => 0; # Success use constant EXIT_BAD_ARGS => 1; # Failed with bad arguments # Later in your program if ( $number_of_arguments < 4 ) # Not enough arguments given ! { exit ( EXIT_BAD_ARGS ); } # And at the end of your program exit( EXIT_OKAY );
Always return a value from both your program and any subroutines in that program. If you don’t use an explicit return statement then the value returned is the result of last statement evaluated. This will change as you modify your code, and in particular since most code is added at the end of a program, the return value from what you’re currently writing will be changing what is seen by whatever wrapper is running your code. If it’s vital that your code not return a value, because, say, you want to indicate that an error occurred but it wasn’t a fatal error, then return undef. In Perl undef is a value that represents not defined. July 31, 2005
13 / 18
When you write Modules, remember that a module must always return a value of TRUE, so the last line of a Module should look like this. 1;
If you cut-and-paste code, then that code belongs in a subroutine.
4
Testing
If your code is destined to be used by others then you must test it. In particular keep a directory or folder with files that are read by your code, and write some scripts to run common cases. When you add new features or debug problems, make sure all the old tests are run so that you can prove that the modifications or additions haven’t caused unintended side-effects that cause old code to stop working correctly (in computer science parlance this is called regression testing).
5
Traps For The Unwary (Or, Things That Catch Everyone Out Eventually)
Remember to use == for numeric tests and eq for string tests. Don’t fall into the C trap of using = (assignment) when you mean == (comparison). Remember not to use = when you mean =~. Always start your Perl code with this: use warnings; use strict; use diagnostics;
All arrays count from 0, not 1. An array of size 20 has elements [0] to [19] inclusive. There isn’t an array item [20]. Hashes have no order, so you can’t use for or foreach with a hash. You also can’t index into them with []. If you need to iterate over a hash you’ll need to use keys and values.
6
Some Common Tasks And Possible Solutions
There are many things that occur over and over again in Perl programming. Here are some simple solutions to some of those tasks.
6.1 Adding A Command Line To A Program Ala Unix First solution: Use the Perl getopts module. The advantage of getopts is that it is all completely written for you. The disadvantage is that if it doesn’t do exactly what you want, then you either alter it or live with it. Second solution: Write your own routine. Here’s a template for it:
14 / 18
July 31, 2005
sub Parse_Command_Line_Arguments( $ ) { my ( $arguments_r ) = @_; my $usage