Perl Programming

  • Uploaded by: Ali Ahmad
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Perl Programming as PDF for free.

More details

  • Words: 46,745
  • Pages: 277
Programming With Perl An Introduction September 2005

Notes:

Structure Of This Course  1. 2. 3.

This course is split into three parts: Introduction Course material Labs/Tutorials/Exercises

(~0.5 hours) (~9.0 hours) (~2.5 hours)

The goal is to cover 75% of the Perl language.  All of the material in this course comes from “Programming Perl 3rd edition” and the Perl Cookbook.  If there’s anything which is not clear then ask as we go.  One thing I would like from this course is feedback, so,  Fill in the course feedback form before you leave tomorrow.

An assumption is that everyone has some programming experience. This course isn’t going to teach programming. Some parts of Perl are not going to be covered - Ties and DBM, Formats, Many system functions. This is all reference material which you can find in any of the standard texts - or in the man pages.

Agenda - Day 1/2        

09:00 09:30 11:15 11:30 12:30 13:00 14:30 14:45

-

09:30 Introduction 11:15 Perl 11:30 Break 12:30 Perl 13:00 Break for lunch 14:30 Perl 14:45 Break 16:00 Perl

        

09:00 - 09:30 Recap of day 1 09:30 - 11:15 Perl 11:15 - 11:30 Break 11:30 - 12:30 Perl 12:30 - 13:00 Break for lunch 13:00 - 14:30 Perl 14:30 - 14:45 Break 14:45 - 15:30 Perl Conclusions, discussion, questions, feedback

Day1

Course agenda is the same times for each of the two days. Labs and exercises happen as we go. If there are any problems or questions you wish to raise, just ask.

Day2

Each day is 09:00 to 16:00 with 30 minutes for lunch and a 15 minutes break in both the morning and the afternoon. Agenda is flexible. If there are specific areas which I haven’t covered in which you have an interest, then ask. There are lots of LABS and exercises - most are small to start with and get more detailed as we get to the end of the course. By the time we get to the end of the overview everyone should be capable of writing simple scripts which manipulate files and do simple pattern matching and substitution. We will be largely learning by example - lots of the examples in this course come from the Perl Cookbook.

The Pursuit Of Happiness (Or The Hard Sell)  Perl is a language for getting your job done.  Designed to make easy jobs easy without making hard jobs impossible  What are the easy jobs?   

Manipulate numbers & text & files & directories & computers & networks. You want to be able to run external programs & scan their output. It should be easy to develop, modify & debug your own programs.

 Perl is a glue language.  Perl is especially popular with web programmers and developers.  But only because they discovered it first.  We will look at perl from a viewpoint of helping in areas of:  Design.  Programming.  Verification.  Documentation/Reporting.  Data analysis.  Perl is an ideal language for data manipulation.

Notes:

What Is Perl?  To those who like it: Practical Extraction and Reporting Language.  To those who love it: Pathologically Eclectic Rubbish Lister.  Above all Perl is:  Free.  Easy to use.  Capable of “One-Liners” or whole projects.  But be careful:  You can write rubbish software in any language.  If you’ve programmed before in Basic, C, C++, Pascal, awk, Python, English then you’ll probably feel comfortable with Perl.

Any language can be used to write code which is not maintainable. Perl isn’t an exception to that rule. We will look at three different styles of programming. 1. 2.

3.

Flat programming - simple scripts. Procedural programming - larger programs based on procedures and control structures and simple data structures. One-liners.

As part of the course notes there is a style guide for Perl. Follow it (or something like it). Some of the things we won’t have time to cover on this course are OO Perl and Advanced Data Structures. If this is something which interests you, then let me know since a follow-up course is possible/likely.

The History Of Perl  This guy is Larry Wall, the creator of Perl.  Perl has been around since 1987 (Perl1).  1988 sees Perl 2, 1989 sees Perl 3.  1991 sees “Programming Perl, 1st edition, and Perl 4. The Internet explodes into growth.  1994 sees Perl 5. . . .  1997 - Perl 6 is announced.  2005 onwards - Perl 6 - we’re still waiting.

Notes:

More About Perl  Perl      

is a rich language: Perl is modularly extensible. You can rapidly design, program, debug & deploy applications. You can extend the functionality of those applications as needed. You can embed Perl in other languages. You can embed other languages in Perl. You can write Object-Oriented Perl.

 A misconception:  Perl is interpreted and so it’s slow!   

Perl compiles to an intermediate format (like Java bytecode or Pascal P-Code). Once it is compiled it is passed to the interpreter for execution. Hence:  You can write faster code in C but you can write code faster in Perl.

 Great solutions come from using pre-built Perl modules written in C:  C speed.  Perl’s convenience and flexibility.

For embedding the choice of language is C since Perl is written in C.

How To Get Perl  Unix:  Available on-site. See: /pd/perl/5.005_503/bin/perl /pd/perl/5.8.6/bin/perl  /usr/local/bin/perl  

 Windows:  Active-state Perl (version 5.8.6) from www.activestate.com  Linux:  Included as part of all standard Linux distributions (version 5.8.6)  Mac OS X:  Included as part of OS X (version 5.8.1 on OS X 10.3.9)

We have various versions on site - recommend that we use 5.8.x. Perl Tk is available on-site in version 5.8.x.

Places To Get Useful Information - I  Internet:  www.perl.com  www.perl.org  www.oreilly.com  search.cpan.org  Comp.lang.perl newsgroup hierarchy:  comp.lang.perl.misc  comp.lang.perl.moderated  comp.lang.perl.modules  comp.lang.perl.tk  Man perl from a unix command line:  Gives all the perl help topics  Ask

(The Perl homepage) (The Perl mongers homepage) (Go here to find Perl modules)

All the news groups listed above are available in this building. Perl is probably the most widely used and understood programming language in Bristol. People can always come and ask me a question if they have a problem.

Places To Get Useful Information - II  Books:  Programming Perl (3rd edition) 

Larry Wall & Tom Christiansen & Jon Orwant - ISBN 0-596-00027-8

 Learning Perl (3rd edition) 

Randal Schwartz and Tom Phoenix - ISBN 0-596-00132-0

 Perl Cookbook (2nd edition). 

Tom Christiansen & Nathan Torkington - ISBN 1-56592-243-3

 Mastering Algorithms With Perl 

Jon Orwant, Jarrko Hietaniemi & John Macdonald - ISBN 1-56592-398-7

 Advanced Perl Programming 

Sriram Srinivasan - ISBN 1-56592-220-4

If you only buy one book make it the camel book (A.K.A. programming Perl) , followed by the Perl Cookbook. If you do buy programming Perl make sure it’s the 3rd edition and NOT the 2nd edition. There are two Perl in 21 Days books, one of which is available on-line at the CR&D bookshelf web-site. (The on-line version can be found in the tutorial areas as a series of PDF files). Since a lot of this course is going to be Perl by example, I’ve placed a few programs into the various tutorial areas which all can be used (reused) as you wish. There’s also a copy of a Perl module (Netlist_Functions.pm) which contains a lot of useful functions which can be imported into your own programs. Hey, why bother programming when you can steal! (This really is the philosophy you should be adopting in your own work.

(Some of) The Perl Manpages

Notes:

Manpage

Covers

perl

What perl manpages are available

perldata

Data types

perlsyn

Syntax

perlop

Operators and precedence

perlre

Regular expressions

perlvar

Predefined variables

perlsub

Subroutines

perlfunc

Built-in functions

perlmod

How to make modules work

perlref

References

perlobj

Objects

perlipc

Inter-process communications

perlrun

How to run Perl commands, plus switches

perldebug

Debugging

perldiag

Diagnostic messages

(More About) The Perl Manpages  See also:  perlfaq1 to perlfaq9  As of Perl version 5.6.1 you can search individual Perl manpages by using the name of the manpage as a command and passing a Perl regular expression as the search pattern.  Examples: perlop comma perlfunc split perlvar ARG perldiag ‘assigned to typeglob’  When you don’t know where something is in the documentation, search all the FAQ’s: perlfaq round

Some Terminology  Idiomatic Perl:  Widespread and accepted ways of doing certain things in Perl. If ( $variable != 56 ) print “Your variable did not equal $variable\n”; print “Your variable did not equal $variable\n” unless ( $variable == 56 );

 Interpolation:  Replacing a variable with the variables value.  Regexp’s:  Regular expressions.  CPAN:  The Comprehensive Perl Archive Network. The place to go to get modules written and contributed by other Perl programmers.  Don’t reinvent the wheel, or if you do then make sure it’s a better wheel.  Share code within your office/group/site/business unit. 

Idiomatic Perl is one of the most confusing bits of Perl since there are so many different ways of doing things. This can be both useful (you can program in the way which suits you) and a drawback (reading other peoples code isn’t always easy) TMTOWTDI - There’s More Than One Way To Do It - the Perl motto. Interpolation will be mentioned a lot by people who use Perl a lot - it’s just a fancy computer science term. Regexps - these are not exactly the same as regular expressions in other UNIX applications - so be careful. CPAN - pretty light on EDA type code. Maybe we should start a forum!

Account Details  There are six user accounts: user1 to user6  Password for each account is: ________  Each area holds:  Copies of all the course material as .pdf files.  Tutorial areas for all the labs.  A “How To” guide.  A document on “Perl Style”.  A list of some common regexp’s.  Issues 1 and 2 of the Perl Review (as .pdf files).

Notes:

Account Details      

Notes:

A Standard Header  This works in Bristol. #!/usr/local/bin/perl use strict; use warnings; use diagnostics;

Preamble

use Carp; use Cwd; use Config;

Some standard modules

use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" ); use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ OS_SPECIFIC/$Config{archname}" );

Extend lib path

use FindBin qw( $Bin ); use lib $Bin;

Current directory

use Netlist_Tools;

Site specific

 There are other binary invocations that use “eval’ with some “magic”.

PREVIEW - Examples Of sprintf() Field

Meaning

%%

A percent sign

%c

A character with the given number

%s

A string

%d

A signed integer, in decimal

%u

An unsigned integer, in decimal

%o

An unsigned integer, in octal

%x

An unsigned integer, in hexadecimal

%e

A floating-point number, in scientific notation

%f

A floating-point number, in fixed decimal notation.

%g

A floating-point number, in %e or %f notation

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Be careful - sprintf() in Perl does its own formatting - it is NOT calling the underlying sprintf() function in the C library.

PREVIEW - Examples Of sprintf() Field

Meaning

%X

Like %x, but using uppercase characters

%E

Like %e, but using uppercase “E”

%G

Like %g, but using uppercase “E” if applicable

%b

An unsigned integer, in binary

%p

A pointer (the Perl value’s address in hexadecimal)

%n

A special: stores the number of characters output so far into the next variable in the argument list.

In addition to the formats on the previous slide, Perl also supports the following conversions. For compatibility, Perl also supports these conversions: %I - a synonym for %d %D - a synonym for %ld %U - a synonym for %lu %O - a synonym for %lo %F - a synonym for %f

PREVIEW - Examples Of sprintf() Flag

Meaning

space

Prefix positive number with a space

+

Prefix positive number with a plus sign

-

Left-justify within field

0

Use zeroes, not spaces, to right-justify

#

Prefix non-zero octal with “0”, non-zero hex with “0x”

number

Minimum field width

.number

“Precision”: digits after the decimal point for floating-point numbers, maximum length for a string, minimum length for an integer.

l

Interpret integer as a C type long or unsigned long

h

Interpret integer as C type short or unsigned short (if no flags are supplied interpret integer as C type int or unsigned

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Perl allows the following flags between the % and the conversion character.

PREVIEW - Examples Of chop() And chomp() Remember, chop is indiscriminate, it always removes something, so you’re supposed to know that the last character on a line is “\n”.

@lines = `cat myfile`; chop @lines; chop($cwd = `pwd`); chop($answer = <STDIN>); $answer = chop($tmp = <STDIN>);

# WRONG

What is in $answer?

$last_char = chop($var);

while () { chomp; # avoid \n on last field @array = split /:/; ... }

chomp is more discriminating, it will only remove the last character if it’s a “\n”. You could also do s/\n$//; which is explicit.

You almost always want to use chomp() and not chop(). chop() always returns the character it removes. If you chop() a list, then every item in the list is chopped. The thing which ends up in $answer in the question on the slide is the character which was removed from the string $tmp. The thing you probably wanted was $tmp. chomp() is discriminating, and although by default it always removes the last character on a line only if that character is “\n”, the default can be overridden. The character (or string) which is removed is that contained in the Perl variable $/. So chomp() can remove any arbitrary length string from the end of an input string. chomp() returns the number of characters it deleted - not the characters themselves.

PREVIEW - Examples Of hex() And oct() $number = hex("ffff12c0"); sprintf "%lx", $number; # (That's an ell, not a one.)

perl -e 'print 0xffdc;'

sprintf uses the same conventions as C’s sprintf.

A neat command line alternative when you need a quick conversion.

$val = oct $val if $val =~ /^0/;

Does $val start with an “0” (as opposed to “0x” or “0b”).

$perms = (stat("filename"))[2] & 07777; $oct_perms = sprintf "%lo", $perms;

Note that you can always set the value of any variable with a hex value just by doing this: $h_number = 0xffdd; print $h; The hex() function is interpreting a string as a hex number, not a value. If the string begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as shown. Hex strings can only represent integers. Strings which would cause integer overflow will trigger a warning. oct() will interpret a string as an octal value. If the string starts with “0” it will be interpreted as octal. If the string starts with “0x” it will be interpreted as a hex value. If it begins with “0b” it will be interpreted as a binary value. Try this: perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad enough to know from what 80’s/90’s TV series this was an episode title.

Programming With Perl September 2005

Notes:

Getting Started  For many programming tasks you’d like a language in which you can say: print "Hello World!\n"

and expect the language to do just that.  Perl is such a language.  Some important points …  This course is an overview.  We’re going to cover a lot of Perl very quickly and there will be lots of examples.  There are many slides in this course which have this symbol in the top left corner of the slide. All such slides are gathered together into a single document called “How-to.pdf” in your labs and exercises directory.

This is a minimal (and complete) Perl program, but it illustrates some important points. 1. You don’t have to say much before you say what you want to say. 2. You don’t have to say much after you’ve said what you want to say either. Unlike many languages, Perl thinks it’s okay that you just fall off the end of your program. You may use the exit() function to end a program (actually, you should use the exit() function to end a program) just as you may force yourself to pre-declare variables before you use them (actually …) It’s up to you! Here are a few important points: 1. The \n at the end of the print statement is a newline. 2. All statements are terminated by a semi-colon. LAB1 - HELLO_1

Variables, Arrays & Lists, Hashes

Notes:

Variables And Their Syntax  A variable is a handy place to keep something:  A place with a name.  Might be private of public.  Might be temporary or permanent.

This This is is what what computer computer scientists scientists call call scope scope

 We’ll learn about scope later (or look up my our local).  A variable is distinguished by the sort of data it holds:  Singular - one thing - strings and numbers.  Plural - many things - lists of strings or lists of numbers (or both).  We call a singular variable a scalar.  We call a plural variable an array.

These are the two fundamental data types in Perl. One of a thing, and more than one of a thing. We call a singular variable a scalar. We call a variable which contains more than one thing, either an array/list or an associative array/hash.

Variables And Their Syntax  We can write a different version of our first example (in the getting started section) like this: $phrase = "Hello, world!\n"; print $phrase;

# Set a variable. # Print the variable.

 We didn’t have to predefine what type of variable $phrase was.  The $ character tells Perl that phrase is a scalar.  Perl has some other variable types with names like hash and handle and typeglob.

Later we’ll see that it is a good idea to force yourself to predefine variables before you use them (using my()). Hash and handle we’ll cover later. Typeglob won’t be covered in this course. LAB1 - HELLO_2 LAB1 - HELLO_3 LAB1 - HELLO_4

Variables And Their Syntax

Type

Character

Example

Is a name for:

Scalar

$

$pounds

An individual value (number or string)

Array

@

@large

A list of values keyed by number

Hash

%

%interest

A group of values keyed by a string

Subroutine

&

&how

A callable chunk of Perl code

Typeglob

*

*struck

Everything named struck

Tips: The $ for scalar is a stylized S. The @ for array is a stylized A. Sadly the analogy breaks down after that. We’ll cover subroutines in detail later in the course. Typeglob won’t be covered in this course.

Variables And Their Syntax Construct

Meaning

$days

Simple scalar value of $days

$days[28]

29th element of @days

$days{‘Feb’}

"Feb" value from hash %days

Construct

Meaning

@days

Array containing($days[0], to ,$days[n])

@days[3,4,5]

Array slice containing ($days[3],$days[4],$days[5])

@days[3..5]

Array slice containing ($days[3],$days[4],$days[5])

@days{‘Jan’,’Feb’}

Hash slice containing ($days{‘Jan’},$days{‘Feb’})

Quiz: What’s the value in $days after this has run? my @days = qw( Monday Tuesday Wednesday Thursday Friday Saturday Sunday ); my $days = @days;

Review: Scalars store a single variable - all scalars are prefixed by $. Arrays store many variables. Arrays start with @ or %. @ arrays are accessed by index - % arrays (hashes) are accessed by a string. Note that the range operator (..) has made an appearance. So 1 .. 20 will give you all the integers between 1 and 20 inclusive. We’ll talk more about the range operator later. In the quiz example we’ve introduced a lot of new stuff. qw (think of this as quoteword) lets you use Barewords to create lists. This whole example is an illustration of context - the value of $days after the example has run is ?

Numeric Literals

$x $x $x $x $x $x $x

= = = = = = =

12345; 12345.67; 6.02e23; 4_294_967_296; 0377; 0xffff; 0b11000000;

# # # # # # #

integer floating point scientific notation underline for legibility octal hexadecimal binary

You can’t use “,” in numbers since in Perl the , is an operator - so we use _ instead. Octal numbers are prefixed with 0 (that’s zero). Hex numbers are prefixed 0x (that’s zero x). Binary numbers are prefixed by 0b (that’s zero b).

Variables Types - Scalars  Scalars are assigned a new value with the = operator.  Scalar variables can be:  Integers.  Floating-point numbers.  Strings.  References to other variables (think C pointers).  Objects.  Double quote marks “” do variable interpolation and backslash interpolation.  Substitution and turning “\n” into a newline.  Single quotes ‘’ suppress interpolation.  Backticks `` will execute an external program and return the output in a string.

The “=“ symbol does assignment. Be careful because the “==“ symbol is used for equality. At some point in your life you’ll accidentally confuse the two. Double quotes do variable and backslash interpolation - Interpolation is a fancy computer-science name for replacing a variable with the contents of that variable. Single quotes suppress interpolation. Backticks (the ones which lean towards the left) will execute an external program and return its output to you in the form of a string.

Variables Types - Scalars $answer = 42; $pi = 3.14159265; $avocados = 6.02e23; $pet = "Camel"; $sign = "I love my $pet"; $cost = 'It costs $100'; $thence = $whence; $salsa = $moles * $avocados; $exit = system("vi $file"); $cwd = `pwd`;

# # # # # # # # # #

an integer a "real" number scientific notation string string with interpolation string without interpolation another variable's value a gastrochemical expression numeric status of a command string output from a command

 Scalars can also hold references to data structures, subroutines and objects. $ary = \@myarray; $hsh = \%myhash; $sub = \&mysub;

# reference to a named array # reference to a named hash # reference to a named subroutine

$ary = [1,2,3,4,5]; # reference to an unnamed array $hsh = {Na => 19, Cl => 35}; # reference to an unnamed hash $sub = sub { print $state }; # reference to an unnamed subroutine $fido = new Camel "Amelia";

# ref to an object

Variable interpolation: $pet = “Camel”; $sign = “I love my $pet”; print $sign; What do you think this will print out?

References will be covered extensively when we get to the in-depth look at Perl. References are the key to writing efficient Perl code with subroutines, and the only way to do OO programming. In the example: $hsh = {Na => 19 , Cl => 35}; the => is the same as a comma “,” - this is convenience which lets us see easily where the keys and where the values are. (Often known as syntactic sugar).

Variables Types - Scalars  If you use a variable which has never been assigned a value then:  The uninitialized variable springs into existence.  Is created with the null value - either 0 or “”.  Depending on how you use them variables will be interpreted as:  Strings.  Numbers.  True or False, i.e. boolean.  Context - suppose you said this: $camels = '123'; print $camels + 1, "\n";

Question: What do you think is printed in the example shown?

Answer: $camels is a string containing the text ‘123’. When Perl tries to add 1 to a string it first converts the string containing the text ‘123’ into the number 123. It then adds 1 and (hopefully) gets 124. This is then converted back into a string containing the text ‘124’ which is then printed. A newline is then printed. LAB2 - VARIABLES1 LAB2 - VARIABLES2 LAB2 - VARIABLES3 LAB2 - VARIABLES4_A LAB2 - VARIABLES4_B PRINTF and SPRINTF and CHOP and CHOMP LAB2 - VARIABLES5_A, _B, _C

Variables Types - Arrays And Hashes  Some kinds of variables hold multiple values:  Arrays.  Hashes.  Like scalars, arrays and hashes spring into existence with nothing in them.  When you assign to them they supply a list context. (We’ll look at this later)  Arrays and Hashes differ from each other:  Use an array to look up something by number. Arrays are always denoted with the “@” symbol - but it’s the whole array.  Use a hash to look up something by name. Hashes are always denoted with the “%” symbol - but it’s the whole hash.  What’s the difference between a list and an array?

Arrays are also called lists - the distinction is blurred - when an array is used with subscripts it’s generally regarded as an array, when it’s used as an ordered list and used with push() pop() shift() and unshift() it’s generally regarded as a list. It also depends upon context as well as how you think about a particular problem. TMTOWTDI.

Variables Types - Arrays  An array is an ordered list accessed by a scalars position in the list. @home = ("couch", "chair", "table", "stove"); ($potato, $lift, $tennis, $pipe) = @home; ($alpha,$omega) = ($omega,$alpha); $home[0] $home[1] $home[2] $home[3]

= = = =

"couch"; "chair"; "table"; "stove";

An array is an ordered list accessed by a scalars position in the list. The list can contain numbers, strings, or a mixture of both. It can also contain references to variables and references to objects or references to other arrays or references to other hashes. To assign a list value to an array you simply group the values together with “(“ and “)”. If you use @home in a list context (on the right side of a list assignment) you’ll get the list back. So you could set 4 scalar variables as shown. List assignments happen in parallel so you can swap two scalar variables as shown in the third example. Arrays are 0 based (as in C) so while the list contains 4 elements the elements are numbered 0 to 3. Array subscripts are enclosed in “[“ and “]” so an individual element is referred to as $home[n]. Since the element is a scalar (a single thing) it is preceded by $.

Variables Types - Arrays  Examples: 1: @stuff = ("one", "two", "three"); 2: $stuff = ("one", "two", "three"); 3: @stuff = ("one", "two", "three"); $stuff = @stuff; 4: @x = (@stuff,@nonsense,funkshun()) 5: @releases = ( "alpha", "beta", "gamma", ); 6: @froots = qw( apple coconut mandarin pear );

banana guava nectarine persimmon

carambola kumquat peach plum

Review: an array variable is able to store a series of values with each uniquely identified by an integer known as its index. The contents of an array are accessed collectively by giving the array name prefixed by an @. @dwarfs = (“Happy” , “Sleepy” , “Grumpy” , “Dopey” , “Sneezy” , “Bashful” , “Doc”); @deadly_sins = (“Gluttony” , “Sloth” , “Anger” , “Envy” , “Lust” , “Greed” , “Pride”); print “@dwarfs never commit @deadly_sins\n”; In the examples shown: 1: The array contains three items. 2: What does $stuff contain ? 3: What does $stuff contain ? 4: What does @x contain ? 5: What does that last “,” do ? 6: But look, we can do away with “,” entirely as long as the list items do not contain white-space.

List Assignment  Examples: 1: my ($a, $b, $c) = (1, 2, 3); 2: my ($map{red}, $map{green}, $map{blue}) = (0xff0000, 0x00ff00, 0x0000ff); 3: my ($dev, $ino, undef, undef, $uid, $gid) = stat($file); 4: my ($a, $b, @rest) = split; my ($a, $b, %rest) = @arg_list; 5: while (($login, $password) = getpwent) { if (crypt($login, $password) eq $password) { print "$login has an insecure password!\n"; } }

@days + 0; scalar(@days)

# implicitly force @days into a scalar context # explicitly force @days into a scalar context

1: Parallel assignment of three scalars. 2: Parallel assignment of three scalars - which are values in a hash. 3: If you don’t want some of the things returned in a list, throw them away by undef’ing them. 4: Here we take $a and $b from the list and then the rest of the list goes into @rest. Here’s an important principle - the first list in the list (so to speak) gets everything else in the list! In the next example $a and $b get the first two values from @arg_list and then the hash %rest gets everything else. There’s an issue here concerning how many items are left in the list before it’s assigned to the hash %rest the length of the list needs to be a multiple of 2. The last two examples show how you can force things into scalar context - the scalar() function is one way.

List And Array Examples  Examples: # Stat returns list value. $modification_time = (stat($file))[9]; # SYNTAX ERROR HERE. $modification_time = stat($file)[9];

# OOPS, FORGOT ()

# Find a hex digit. $hexdigit = ('a','b','c','d','e','f')[$digit-10]; # Get multiple values as a slice. ($day, $month, $year) = (localtime)[3,4,5];

Note: lists grow dynamically, so you can have a 4 element list like this: my @list = qw( fred barney wilma betty ); and say this: $list[656] = "dino"; And Perl will create all the intervening array slots for you (they will all have the value undef). If you create a big array and you’d later like to delete it (to save on memory perhaps) then you can do this: my @big_array = (); # create the array @big_array = <SOME_FILE>; @big_array = undef;

# load a ton of stuff into it" # delete the array

If you want to remove all the entries in an array without undef’ing it and then recreating it, then just do this: @my_array = (); The same works for hashes as well - to empty a hash just do this: %my_hash = ();

Variables Types - Arrays  Since arrays are ordered you can do useful operations on them such as;  Stack operations: push() pop()  shift()  unshift()  

shift and unshift work here.

push and pop work here.

 Example: @home = ( "go", "where" , "no", "one" , "has" , "gone" ); push( @home , "before" ); unshift( @home , "boldly" ); unshift( @home , "To" ); $first = shift( @home ); $last = pop( @home ); print "First = $first and last = $last\n";

Perl regards an array as an ordered list. The end of the array (i.e. the right-hand part of the list) is considered the top of the stack. push() and pop() work on the top of the stack. shift() and unshift() work on the other end of the stack. shift() takes one element from the start of a list, unshift puts a new element at the start of the list. What do you think is printed on the last line of the example?

What does the list @home contain once the example has been run?

How Do I … Specify A List In A Program?  You want to include a list in your program. @a = ("quick", "brown", "fox");

A comma separated list

@a = qw( Why are you bugging me? );

Use qw() if you have a lot of Single-word elements Use something like this if you want to read a list from a file

@bigarray = (); open(DATA, "< mydatafile") or die "Couldn't read from datafile: $!\n"; while () { chomp; push(@bigarray, $_); } $banner = 'The Mines of Moria'; $banner = q(The Mines of Moria); $name = "Gandalf"; $banner = "Speak, $name, and enter!"; $banner = qq(Speak, $name, and welcome!);

More info: See The Perl Cookbook, section 4.1 Page 91.

Use the quoting operators. These two lines are equivalent. q() is the same as single quotes Use the quoting operators. These lines are equivalent. qq() is the same as double quotes

How Do I … Specify A List In A Program?  You want to include a list in your program.

$his_host $host_info

= 'www.perl.com'; = `nslookup $his_host`; # expand Perl variable

$perl_info = qx(ps $$); $shell_info = qx’ps $$';

# that's Perl's $$ # that's the new shell's $$

Backticks qx()

@banner = ('Costs', 'only', '$4.95'); @banner = qw(Costs only $4.95); @banner = split(' ', 'Costs only $4.95');

These 3 are identical

@banner = qw|The vertical bar (\|) looks and behaves like a pipe.|;

Different quoting character

More info: See The Perl Cookbook, section 4.1 Page 91. qx() and backticks are not exactly the same. Backticks do not stop variable interpolation while qx() does. If you don’t want Perl variables to be expanded then you can use a single-quote delimiter on qx() to stop this. q(), qq() and qx() quote single strings. qw() quotes a list of single word strings by splitting its argument on whitespace without variable interpolation. If you don’t want to change the quoting character, use a backslash to escape the delimiter in the string.

How Do I … Change The Size Of an Array?  You want to enlarge or truncate an array. # grow or shrink @ARRAY $#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER $ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;

$#ARRAY is the number of the last element in @ARRAY If you assign it a number smaller than its current value then the array is truncated. Truncated elements are lost. If you assign it a number bigger than its current value then the array grows. All new elements have the value undef. $#ARRAY is not equal to @ARRAY (or scalar( @ARRAY) ).

More info: See The Perl Cookbook, section 4.3 Page 95.

Solution: Assign to $#ARRAY

How Do I … Swap Values Without Using A Temporary Variable?  You want to exchange the values of two variables, but don’t want to use a temporary variable. ($VAR1, $VAR2) = ($VAR2, $VAR1);

Solution

$temp $a $b

Normally you would do something like this (say in C)

= $a; = $b; = $temp;

($alpha, $beta, $production) = qw(January March August); # move beta to alpha, # move production to beta, # move alpha to production ($alpha, $beta, $production) = ($beta, $production, $alpha);

You can swap more than two things at a time

More info: See The Perl Cookbook, section 1.3 Page 8. Most programming languages require you to use a temporary variable when swapping two variables values. Perl however will track both sides of the assignment and guarantees that you won’t accidentally clobber any of your values. This lets you eliminate the temporary variable. You can also exchange more than two variables at once.

How Do I … Append One Array To Another?  You want to join two arrays together by adding all the items of one to the end of the other. push(@ARRAY1, @ARRAY2);

Solution: Use push()

@ARRAY1 = (@ARRAY1, @ARRAY2);

Solution: List flattening

@members = ("Time", "Flies"); @initiates = ("An", "Arrow"); push(@members, @initiates); # @members is now ("Time", "Flies", "An", "Arrow") splice(@members, 2, 0, "Like", @initiates); print "@members\n"; splice(@members, 0, 1, "Fruit"); splice(@members, -2, 2, "A", "Banana"); print "@members\n";

Add new elements into a list using splice()

This is output: Time Flies Like An Arrow Fruit Flies Like A Banana

More info: See The Perl Cookbook, section 4.9 Page 108. Push() is optimised for appending a one array to another. If you use list flattening beware that this takes more memory and is slower. If you want to insert elements of one array into the middle of another, use splice(). The splice() function: We’ve already seen push, pop, shift and unshift. They are all examples of a generic function called splice(). The splice function takes four arguments: an array to be modified, the index at which it is to be modified, the number of elements to be removed (starting at the index specified in the previous argument), and a list of extra elements to be inserted at the index (after the previous elements are removed). The function returns a list of the elements which are removed.

List Flattening  Contrary to what you might expect: @virtues = ( “Faith” , “Hope” , ( “Love” , “Charity ) );

 This doesn’t produce a hierarchical list of three elements where the third element is itself a two-element list.  Each element of a list must be a scalar, not another list.  Above example is actually the same as: @virtues = ( “Faith” , “Hope” , “Love” , “Charity );

 It is easy to make a hierarchical list in Perl - see references.

LAB3 - ARRAYS_1 LAB3 - ARRAYS_2 LAB3 - ARRAYS_3 LAB3 - ARRAYS_4 LAB3 - ARRAYS_5

Pick Your Own Quotes Customary

Generic

Meaning

Interpolates

‘’

q//

Literal string

No

""

qq//

Literal string

Yes

``

qx//

Command execution

Yes

()

qw//

Word list

No

//

m//

Pattern match

Yes

s///

s//

Pattern substitution

Yes

y///

tr//

Character translation

No

""

qr//

Regular expression

Yes

$single = q!I said, "You said, 'She said it.'"!; $double = qq(Can't we get some "good" $variable?);

Some of these forms are syntactic sugar which allow you to not put lots of formatting in strings (which might be confusing and lead to mistakes). In the first example we’ve used ! As the quote mark, which means we can freely use “ and ‘ in the text string we wish to build. We could have used our normal quotes and escaped the “ and ‘ quotes inside the string, but it would have been very hard to read. Any character in a string which might be otherwise interpreted as a controlling character, can always be included in a string by escaping it - i.e. if we want to put a “ in a double-quoted string, we can always do this by writing the “ inside the string as \”. \ followed by {any character} is the same as {any character}.

Variables Types - Hashes    

Hashes are arrays accessed by a string. Hashes are also called associative lists. push() and pop() and shift() and unshift() have no meaning for hashes. A hash has no beginning and no end. @home

%longday

1

2

3

4

Couch

Chair

Table

Stove

Sat Saturday Tue Tuesday Mon Monday Sun

 The % character is used to mark hash names.

Thu Thursday Fri Friday Wed Wednesday

Sunday

Hash keys are not automatically implied by their position. In fact the concept of position has no meaning for a hash. (And as we will see later, this means that you can’t use foreach on a hash to loop over all the things in the hash). You must supply a key as well as a value when populating a hash. You can assign a list to a hash (just like an array) but pairs of items from the list will be interpreted as key/value pairs in the hash. So you can say this: @list = ( “Sat” , “Saturday” , “Sun” , “Sunday” , etc , “Fri” , “Friday” ); %hash = @list;

This is the same as: %hash = ( “Sat” => “Saturday” , “Sun” => “Sunday” , etc , “Fri” => “Friday” );

Variables Types - Hashes  %longday could be declared like this: %longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday", "Wed", "Wednesday", "Thu", "Thursday", "Fri", "Friday", "Sat", "Saturday");

 This is hard to read, so Perl provides => as an alternative to the comma. %longday = ( "Sun" => "Mon" => "Tue" => "Wed" => "Thu" => "Fri" => "Sat" => );

"Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday",

As in the example from the previous slide - suppose you wanted to translate abbreviated days names to their corresponding full names. You could write the list assignment as shown in the top box. This is visually noisy, so Perl provides the => (comma operator) so that with a bit of creative formatting the same statement can be written as shown in the second example. Remember - Hashes have no order to them - all accessing is done via the keys. Do not try to use foreach to loop over the values in a hash.

Variables Types - Hashes  Hashes are still an array full of scalars.  Select an individual hash element using { and }.  Example - the value associated with “Wed” in our example is: $longday{ “Wed” };  Note we’re dealing with a scalar value so there’s a $ on the front, not a %.  Example:  Suppose we have a hash called %wife. The name of the hash

Since this is a hash We need { and }

$wife{”Tony"} = ”Cherie";

A scalar so $

The key

The value

You can assign a list to a hash - see our previous examples - each pair of items in the list is taken as (respectively) a key and a value. You can assign a hash to a list. If you do then it’ll convert the hash into a list of key/value pairs. Often we use: The keys() function to extract a list of just the keys. This list will also be unordered (the respective keys won’t be in a list in the same order that they were entered into the original hash) but can be sorted using sort(). Remember a single element of a hash is still a scalar - so it is always prefixed by a $ and not a %. The % refers to the whole hash and not to individual elements. You also need to use “{“ and “}”. It is generally true that things don’t come back out of a hash in the same order that they go in (if say, you get all the keys back out with the keys() function). Do not try to use push(), pop(), shift() or unshift() with hashes. They don’t work remember, position in a hash has no meaning.

Functions Which Work With Hashes  A limited set of functions work with Perl hashes:  keys List all the keys in a hash  values List all the values in a hash  each Used to iterate key/value pairs  exists Tells you whether a hash key exists  delete Deletes a hash key/value pair. my @keys = keys( %my_hash ); my @values = values( %my_hash );

while ( my ( $key , $value ) = each %my_hash ) { print $key . " " . $value . "\n"; }

In the same way that an array can be deleted by assigning it with undef, so can a hash. So to delete a hash, do this: %my_hash = undef; If however you just want to remove all the entries in the hash without undef’ing it and then recreating it, then just do this: %my_hash = (); i.e. assign the empty list to the hash.

How Do I … Create A Hash?  You want to create and populate a hash with key/value pairs. %age = ( "Nat", 24, "Jules", 25, "Josh", 17 );

Solution: A hash can initialised with a list where each pair if values in the list being interpreted as a key/value pair.

$age{"Nat"} = 24; $age{"Jules"} = 25; $age{"Josh"} = 17;

This is the same as above

%food_color = ( "Apple" "Banana" "Lemon" "Carrot" );

=> => => =>

"red", "yellow", "yellow", "orange"

%food_color = ( Apple Banana Lemon Carrot );

=> => => =>

"red", "yellow", "yellow", "orange"

You can also use the comma operator => to initialise a has like this.

The => operator automatically quotes anything on its left, so you can omit the quotes on the keys

More info: See The Perl Cookbook, section 5.0 Page 129. Solution: assign a list of pairs of items to the hash. You can also use the => operator to do the same thing - it visually easier to see what is happening and where the key/value pairs are located in the list. Using => will automatically quote what’s on its left. Single-word hash keys are also automatically quoted, so you can write $hash{“somekey”} as $hash{somekey}. Hashes are stored in an order which is convenient for the implementation of hashes, which means that the extraction order is not the same as the insertion order.

How Do I … Add An Element To A Hash?  You need to add an element to a hash. $HASH{$KEY} = $VALUE;

Solution: Simply add a new entry like this

More info: See The Perl Cookbook, section 5.1 Page 130. Solving this problem is easy - just add any new entry as shown. Perl will take care of all memory management for you, and just as with arrays and lists, you don’t need to worry about overflow. If you use undef as a hash key it will be turned into the empty string “”. If you try to get a value for a key which isn’t in the hash you’ll also get undef, so you can’t simple use if $hash{key} to see if a key exists. You need to use exists($hash{key}) to test whether the key is in the hash, defined($hash{key}) to see if it is or is not undef, and if($hash{key}) to test it for true or false.

Hashes  Remember - a hash is just an array where things are looked up by name.  If you assign a list to a hash - pairs of items become key/value associations. %map = ('red',0xff0000,'green',0x00ff00,'blue',0x0000ff); %map = (); # clear the hash first $map{red} = 0xff0000; $map{green} = 0x00ff00; $map{blue} = 0x0000ff; %map = ( red => 0xff0000, green => 0x00ff00, blue => 0x0000ff, ); $field = radio_group( NAME VALUES DEFAULT LINEBREAK LABELS );

=> => => => =>

'animals', ['camel', 'llama', 'ram', 'wolf'], 'camel', 'true', \%animal_names,

The => operator has the nice side effect of quoting anything on its left, so we can leave the quotes off red, green, blue in the third example. The value on the right of => will still need quotes if it is a character string. The last example uses named parameters to invoke complex functions. The hash when it’s initialized, is done in some order. The values generally don’t come back out in the order they went in. You can’t use scalar( %hash ) (or even use %hash in scalar context) to find out how many things are in the hash. If you want to know that, use: scalar( keys( %hash ) ); or scalar( values( %hash ) ); LAB4 - HASH_1 LAB4 - HASH_2 LAB4 - HASH-3 LAB4 - HASH_4 LAB4 - HASH_5

Array And Hash Slices  Slicing an array: print $tragedy[3] , $tragedy[4] , $tragedy[5]; print @tragedy[3,4,5] These are equivalent

 Slicing a hash:

Note: [ and ]

print ($sound{cat} , $sound{goldfish} , $sound{dog} , $sound{whale} ); print @sound{ “cat” , “goldfish” , “dog” , “whale” };

These are equivalent

Note: { and }

Slicing an array: The things in the array slice are not copies - they are the same elements. So assigning to the array slice is also assigning to the original array elements. (The same is also true for a hash slice). The slice is a list (hence the @) and the brackets are [ and ]. Slicing a hash: The values() function returns hash values in an apparently random order, so to create a list of values from a hash with a specific order we often have to do something similar to what is shown in the example. Instead of putting a single key in the curley braces, we put a list of keys in the curley braces. The slice is a list (hence the @ and NOT a $ or a %) and the brackets are { and }.

Scalar And List Context  Examples: $x $x[1] $x{"ray"}

= funkshun(); = funkshun(); = funkshun();

# scalar context # scalar context # scalar context

@x @x[1] @x{"ray"} %x

= = = =

# # # #

funkshun(); funkshun(); funkshun(); funkshun();

list list list list

context context context context

($x,$y,$z) = funkshun(); ($x) = funkshun();

# list context # list context

my my my my

# # # #

$x @x %x ($x)

= = = =

funkshun(); funkshun(); funkshun(); funkshun();

Funkshun() should always figure out what it is supposed to return.

scalar context list context list context list context

The first three examples are all evaluated in scalar context. The second set of examples are all evaluated in list context - even if the assignment only picks out a single value from such a list. The rules don’t change when using my to force ourselves to declare variables. A well designed function can figure out what context it’s been called in (using wantarray) and return what is appropriate. The wantarray function is used like this: If wantarray { return @an_array; } else { return $a_scalar; }

Variables Types - Simple Data Structures  Arrays and Hashes are simple, flat data structures.  How do we build more complex data structures?  Here’s the wrong way and the right way to do it: $wife{"Jacob"} = ("Leah", "Rachel", "Bilhah", "Zilpah"); # WRONG $wife{"Jacob"} = ["Leah", "Rachel", "Bilhah", "Zilpah"]; # RIGHT

 Once this is done you can refer to individual elements like this: $wife{"Jacob"}[0] $wife{"Jacob"}[1] $wife{"Jacob"}[2] $wife{"Jacob"}[3]

= = = =

"Leah"; "Rachel"; "Bilhah"; "Zilpah";

Sometimes you need to build not-so-lovely and not-so-simple data structures. Perl lets you do this by pretending that complicated values are really simple ones. We want $wife{“Jacob”} to refer to a single thing (it’s a scalar) so it must refer to a Perl reference, and a reference to a list is created using [ and ] and not ( and ). We are telling Perl to pretend that a whole list is in fact a scalar. The statement creates an anonymous array (i.e. and array without a name) and puts a reference to it into the hash element $wife{“Jacob”}. This is how Perl deals with both multi-dimensional arrays and nested data structures. You can see in the second example how this looks like a multi-dimensional array with one string subscript and one numeric subscript. We’ll discuss this is more detail tomorrow … This example (and the one on the following page) are here to demonstrate that making complex data structures is easy.

Variables Types - Simple Data Structures  Example: $kids_of_wife{"Jacob"} = { "Leah" => ["Reuben","Simeon","Levi","Judah","Issachar","Zebulun"], "Rachel" => ["Joseph","Benjamin"], "Bilhah" => ["Dan","Naphtali"], "Zilpah" => ["Gad","Asher"], };

$kids_of_wife{"Jacob"}{"Leah"}[0] $kids_of_wife{"Jacob"}{"Leah"}[1] $kids_of_wife{"Jacob"}{"Leah"}[2] $kids_of_wife{"Jacob"}{"Leah"}[3] $kids_of_wife{"Jacob"}{"Leah"}[4] $kids_of_wife{"Jacob"}{"Leah"}[5] $kids_of_wife{"Jacob"}{"Rachel"}[0] $kids_of_wife{"Jacob"}{"Rachel"}[1] $kids_of_wife{"Jacob"}{"Bilhah"}[0] $kids_of_wife{"Jacob"}{"Bilhah"}[1] $kids_of_wife{"Jacob"}{"Zilpah"}[0] $kids_of_wife{"Jacob"}{"Zilpah"}[1]

= = = = = = = = = = = =

"Reuben"; "Simeon"; "Levi"; "Judah"; "Issachar"; "Zebulun"; "Joseph"; "Benjamin"; "Dan"; "Naphtali"; "Gad"; "Asher";

Suppose we not only wanted to know the names of Jacob’s wives, but also the names of all sons of all his wives. In this case we want to treat a hash as a scalar - we use { and } for that. Now we have an array in a hash in a hash. Adding another level to a nested data structure is like adding another dimension to a multi-dimensional array. The important point is that Perl lets you pretend that something which is complex is a simple scalar. Perl’s whole object oriented structure is built upon this kind of encapsulation. Again, we’ll discus this in detail tomorrow.

Variable Types - Packages  Why use packages?  Use other peoples code.  Let’s us split up our own code into manageable units.  Is the basis for the whole of Perl’s OO system.  Ensures that our code (subroutine & variable names) do not clash with imported code. # This file is Matrix.pm

# This file is Solve.pm # This is our code

... use Matrix; sub print_me { # Code to print out a matrix }

... sub print_me { # Code to print out an equation }

Packages are a way of splitting up your code. They are roughly equivalent to C/Spice/Verilog .include statements. Suppose we pick up Matrix.pm from somewhere - it has a subroutine called print_me. We import Matrix.pm and We also have a subroutine called print_me which does something completely different. When we want to call print_me, which subroutine do we call?

Variables Types - Packages  Suppose you want to talk about matrices.  You would start off by saying this in Matrix.pm: package Matrix;

 The effect of this is that from this point onwards any global name in Matrix.pm will be prefixed by Matrix::  So if you say: package Matrix; $result = &print_me();

 Then the real name of $result is $Matrix::result and the real name of &print=me() is &Matrix::print_me()

In computer-science, and in Perl, each of these packages establishes a “namespace”. You can have as many namespaces as you want but you’re only ever in one at a time. If we don’t use a package declaration in our program then the default name is “Main::” This means that the previous example will work since print_me() in Matrix.pm is really &Matrix::print_me() while print_me in solve.pm is really &Main::print_me(). {We would be better off in Solve.pm using a declaration like package Solve; - what would the &print_me() subroutine be called then?} Code which is brought into a program like this with a use command, is also called a module. The standard is to name the module with the same name as the package it contains (but with an initial uppercase letter) and with a .pm filename suffix. Thus the code for package Matrix; would be contained in a file called Matrix.pm The nice thing about Perl is that there are a *lot* of packages “out there” that you can use to solve all sorts of problems.

Variables Types - Pragma’s  In the previous section we used the “use” command to load in some new code (a module).  Some of the built-in modules in Perl don’t add code. Rather they change the way that the language behaves.  These special modules are called pragmas.  Example: use strict;

Pragma’s change the way the language works. In the example shown, it tightens up on some of the rules which Perl uses by default and requires the programmer to be explicit. This example would require that you predefine all your variable names - this is usually a good thing - see the section on style in about five minutes time.

How Do I … Round Floating-Point Numbers?  You want to round a floating-point number to a certain number of decimal places. $rounded = sprintf("%FORMATf", $unrounded);

General solution - use sprintf (or printf).

$a = 0.255; $b = sprintf("%.2f", $a); print "Unrounded: $a\nRounded: $b\n"; printf "Unrounded: $a\nRounded: %.2f\n", $a; Unrounded: 0.255 Rounded: 0.26 Unrounded: 0.255 Rounded: 0.26

More info: See The Perl Cookbook, section 2.4 Page 46. The “f” argument in sprintf will let you specify how many decimal places the argument should be rounded to. Perl looks at the next digit in the number, rounds it up if it is 5 or greater, or down otherwise.

How Do I … Compare Floating-Point Numbers?  You want to compare floating-point numbers to know if they’re equal to a certain level of significance. # equal(NUM1, NUM2, ACCURACY) : returns true if NUM1 and NUM2 are # equal to ACCURACY number of decimal places sub equal { my ($A, $B, $dp) = @_; return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B); }

More info: See The Perl Cookbook, section 2.2 Page 45. Floating-point arithmetic isn’t precise so you should never do a direct comparison using “==“. The solution is to turn the floating-point numbers into strings using sprintf and then compare those strings. Alternatively use a large multiplier on both numbers (like 1000000), turn that result into an integer and then use “==“, but this demands that you have some idea of the magnitude of the numbers before you start. If the number of decimal places is fixed this make this latter solution easier.

How Do I … Convert Binary And Decimal Numbers?  You have an integer whose binary representation you would like to print out, or a binary number which you would like to print as an integer. sub dec2bin { my $str = unpack("B32", pack("N", shift)); $str =~ s/^0+(?=\d)//; # otherwise you'll get leading zeros return $str; } sub bin2dec { return unpack("N", pack("B32", substr("0" x 32 . shift, -32))); } $num = bin2dec('0110110'); $binstr = dec2bin(54);

# $num is 54 # $binstr is 110110

More info: See The Perl Cookbook, section 2.3 Page 48. You can’t solve either problem with sprintf since it doesn’t have a “print in binary” format. So we use pack and unpack for manipulating strings of data. Both the pack and unpack functions take arguments which specify what they should do with their arguments.

How Do I … Control Case?  A string in uppercase needs converting to lowercase, or vice-versa. use locale;

# needed in 5.004 or above

$big = uc($little); $little = lc($big); $big = "\U$little"; $little = "\L$big";

# # # #

$big = "\u$little"; $little = "\l$big";

# "bo" # "BoPeep"

"bo peep" "JOHN" "bo peep" "JOHN"

-> -> -> ->

"BO PEEP" "john" "BO PEEP" "john"

-> "Bo" -> "boPeep"

Note: Lowercase u & l

Transform just the first letter of a word

Note: Uppercase U & L

Transform the whole word

Obey the language environment Use functions Use string escapes Use string escapes

# You can do case insensitive string comparisons like this: if (uc($a) eq uc($b)) { print "a and b are the same\n"; }

More info: See The Perl Cookbook, section 1.9 Page 19. The two ways of doing the conversions (functions and string escapes) look different, but do the same thing. You can set the case of either the first character or the whole word. The use locale directive tells the Perl case conversion functions and pattern matching engine to respect your language environment, allowing for languages with umlauts, accent marks, cedillas and other diacritics used in many languages. You can also use the case conversion functions and pattern matching to do case insensitive string comparisons.

How Do I … Find Out Today’s Date?  You need to find out the year, month and day values for today’s date. ($day, $month, $year) = (localtime)[3,4,5]; printf("The current date is %04d %02d %02d\n", $year+1900, $month+1, $day); # prints - The current date is 2005 08 08 # Could also have been written - ($day, $month, $year) = (localtime)[3..5]; use Time::localtime; $tm = localtime; ($DAY, $MONTH, $YEAR) = ($tm->mday, $tm->mon, $tm->year);

This is an object-oriented version of localtime().

More info: See The Perl Cookbook, section 3.1 Page 73. Solution - use localtime() and extract the information you want from the list it returns. Or, use Time::localtime which overrides locatime() to return a Time:tm object. You can then use the inbuilt method calls of the Time::localtime object to get the values you want.

Style, File Handles & Operators

Notes:

Running Perl Programs And Scripts  If you’re doing something simple - this will work: % perl -e ‘print "Hello World!\n";’

 For longer scripts put the code into a file and say this: % perl grading

 The most convenient way is to make the file executable and ensure this line is at the top of the file:  #!/usr/local/bin/perl -w % grading

% at the start of the following lines is the Unix shell prompt. % perl -e : You’re basically trying to cram everything onto one line. % perl grading : Feed the program explicitly to Perl. % grading : Let the shell call Perl to run the script. Useful tip - never just use this at the top of your file to invoke Perl: #!/usr/local/bin/perl But rather use this instead: #!/usr/local/bin/perl -w This will turn on lots of warning messages.

Good Programming Practice #!/usr/local/bin/perl -w use lib "/a/unix/path/to/my/Perl/Modules"; # Pull in some modules use strict; use Netlist_Functions; # Define a constant use constant PI => 3.141562953589793; # Create some variables my @args = (); my $flag = TRUE; # ALL YOUR PROGRAM CODE GOES HERE exit 0; # Put all your subroutines here

A more extensive version of this template can be found in the tutorial area and in your notes. Note: Once you “use strict;” all your variable will have to be defined like this: my $variable; Or my $variable = 56; You’ll get compile time errors if you don’t use my. Perl will also tell you about variables you define and never use. For any programs other than one-liners, ALWAYS use a methodology like this - it will save you lots of time in debugging applications. We’ll talk more about strict later.

Style Guidelines  See the separate document provided with the course notes.  Here’s a brief summary:  Enable warnings with “#!/usr/local/bin/perl -w” or use warnings;  Use “use strict;”  Use “==” for numeric tests and eq for string tests.  Don’t confuse “==” and “=”.  Don’t confuse “=” and “=~”.  Use a consistent indent when writing code.  Use consistent bracket matching.  Never, ever use “goto”.  Don’t use printf when print will do - which is nearly always.  Use comments - lots of comments.  Document your code.

Note that there’s a complete style guide included in the course notes. There’s also a separate style presentation later in the course.

Filehandles  A filehandle is a name given to a file, device, socket or pipe.  Filehandles hide the complexity of buffering from your program.  They also provide a symbolic name.  You create a filehandle using the open() function.  Open() needs two parameters:  The filehandle.  A filename.  STDIN, STDOUT and STDERR are predefined for you.  You also need to specify the behavior of the open() function.

Notes:

Filehandles  Using open() open(SESAME, open(SESAME, open(SESAME, open(SESAME, open(SESAME, open(SESAME,

"filename") "filename") ">>filename") "| output-pipe-command") "input-pipe-command |")

# # # # # #

print STDOUT "Enter a number: "; $number = <STDIN>; print STDOUT "The number is $number.\n";

read from existing file (same thing, explicitly) create file and write to it append to existing file set up an output filter set up an input filter

# ask for a number # input the number # print the number

chop($number = <STDIN>);

# input number and remove newline

$number = <STDIN>; chop($number);

# input number # remove newline

You can use open to create filehandles for a variety of purposes (input, output, piping). Once opened the filehandle can be used to access the file or device until it is closed with … Using open with the same filehandle again will close the first filehandle. Once a file is open it can be read from using the line reading operator <>. An empty <> will read from STDIN. What is STDOUT doing with the print statement in the second example? Since it’s the default - you don’t need it. The last two examples do the same thing - you’ll most frequestly see the first - this is one of Perl’s common idioms. Note that when you do use a filehandle with a print statement, there’s no “,” between the print, the filehandle and the text.

How Do I … Process All The Files In A Directory  You want to do something to each file in a particular directory. opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { # do something with "$dirname/$file" } closedir(DIR);

Solution: Use opendir to open the directory and readdir to retrieve all the filenames

$dir = "/usr/local/bin"; print "Text files in $dir are:\n"; opendir(BIN, $dir) or die "Can't open $dir: $!"; while( defined ($file = readdir BIN) ) { print "$file\n" if -T "$dir/$file"; } closedir(BIN);

Example: Read all the files and add on the directory path at the front of the filenames

More info: See The Perl Cookbook, section 9.5 Page 318. The opendir, readdir and closedir functions operate on directories the same way that open, close and <> operate on files. Both use handles, but the handles used by the directory functions are different from those used by files. In scalar context readdir returns the next filename from a directory until it runs out of names, at which point it returns undef. In list context it returns the rest of the filenames in a directory or an empty list if there are no filenames left.

Operators - Arithmetic

Example

Name

Result

$a + $b

Addition

Sum of $a and $b

$a * $b

Multiplication

Product of $a and $b

$a % $b

Modulus

Remainder of $a divided by $b

$a ** $b

Exponentiation

$a to the power $b

You can work out subtraction and division for yourself. You can always use ( and ) to force the order of evaulation you want.

Operators - String  There is an addition operator for strings that performs concatenation.  Perl uses . $a = 123; $b = 456; print $a + $b; print $a . $b;

# prints 579 # prints 123456

 There’s also a “multiply” operator for strings, called the repeat operator. $a = 123; $b = 3; print $a * $b; print $a x $b;

# prints 369 # prints 123123123

Note in the above how Perl is converting from numbers to strings as needed. String concatenation is also implied in interpolation which occurs in double-quoted strings.

Operators - String  The following three statements all print the same thing. print $a . ' is equal to ' . $b . ".\n"; print $a, ' is equal to ', $b, ".\n"; print "$a is equal to $b.\n";

# dot operator # list # interpolation

Of the three different ways of printing shown above, interpolation is the easiest to understand.

Operators - Assignment  Assignment: $a = $b; $a = $b + 5; $a = $a * 3; $a *= 3; $line .= "\n"; $fill x= 80; $val ||= "2";

# Append newline to $line. # Make string $fill into 80 repeats of itself. # Set $val to 2 if it isn't already "true".

$a = $b = $c = 0; # C programmers will be familiar with this ($temp -= 32) *= 5/9; chop($number = <STDIN>);

First three assignments are hopefully obvious Second and third examples are op= syntax and works for all of Perl’s binary operators.

Operators - Unary Arithmetic  Can use something like $variable += 1 as shorthand.  Perl also has autoincrement and autodecrement operators. Example

Name

Result

++$a, $a++

Autoincrement

Add 1 to $a

--$a, $a--

Autodecrement

Subtract 1 from $b

 If you place the operator in front of the variable it is known as pre-increment or predecrement.  The value is changed before it is used.  If you place the operator after the variable it is known as post-increment or postdecrement.  The value is changed after it is used.

If you’ve used C before this is exactly the same is pre/post increment/decrement in that language. $count = 3; $limit = $count++; print “Count=$count and Limit=$limit\n”; Count=4 and Limit=3 or $count = 3; $limit = ++$count; print “Count=$count and Limit=$limit\n”; Count=4 and Limit=4

Operators - Unary Arithmetic  Example: $a = 5; $b = ++$a; $c = $a--;

Notes:

# $a is assigned 5 # $b is assigned the incremented value of $a, 6 # $c is assigned 6, then $a is decremented to 5

Operators - Logical  Also known as short-circuit operators.  Allow the program to make decisions without using lots of “if” statements. Example

Name

Result

$a && $b

And

$a if $a is false, $b otherwise

$a || $b

Or

$a if $a is true, $b otherwise

! $a

Not

True of $a is not true

$a and $b

And

$a if $a is false, $b otherwise

$a or $b

Or

$a if $a is true, $b otherwise

not $a

Not

True of $a is not true

$a xor $b

Xor

True if $a or $b is true, but not both

open(GRADES, "grades") or die "Can't open file grades: $!\n";

Called short-circuit operators because they skip the evaluation of rightward arguments once they have enough information to decide an overall result. The bottom example is from our grading program. Perl tries to open the file called “grades”. If it succeeds then the program continues with statements which follow this line, otherwise Perl issues an error message via the die() function and stops. Note that this code is visually easy on the eye and the important thing which the line it trying to do is the first thing on the line - secondary actions are off to the right of the code.

Operators - Numeric And String Comparison  There are two sets of operators - one for numbers and one for strings.

Comparison

Numeric

String

Return Value

Equal

==

eq

True if $a is equal to $b

Not equal

!=

ne

True is $a is not equal to $b

Less than

<

lt

True if $a is less then $b

Greater than

>

gt

True if $a is greater than $b

Less than or equal

<=

le

True if $a is not greater than $b

Greater than or equal

>=

ge

True if $a is not less than $b

Comparison

<=>

cmp

0 if equal, 1 if $a greater, -1 if $b greater

Notes:

Operators - File Test  File test operators let you find out information about files before you blindly muck about with them.  Here are a few of the file test operators. Example

Name

Result

-e $a

Exists

True if the file named in $a exists

-r $a

Readable

True if the file named in $a is readable

-w $a

Writable

True if the file named in $a is writable

-d $a

Directory

True if the file named in $a is a directory

-f $a

File

True if the file named in $a is a regular file

-T $a

Text file

True if the file named in $a is a text file

-e "/usr/bin/perl" or warn "Perl is improperly installed\n"; -f "/vmlinuz" and print "I see you are a friend of Linus\n";

There are a lot more operators not listed - see the Perl man pages or Programming Perl etc.

More On Input Operators  The command input operator ``. (Also known as backtick or qx//).  The most heavily used input operator is <> (also called the diamond operator).  Examples: while (defined($_ = <STDIN>)) { print $_; } while ($_ = <STDIN>) { print; } while (<STDIN>) { print; } for (;<STDIN>;) { print; } print $_ while defined($_ = <STDIN>); print while $_ = <STDIN>; print while <STDIN>;

# # # # # # #

the longest way explicitly to $_ the short way while loop in disguise long statement modifier explicitly to $_ short statement modifier

All of these lines Are equivalent

 $_ is the default variable which is used implicitly (when you’re not explicit).

You can use the backtick operator to execute any system command like this: $info = `finger $user`; # Or - qx/finger $user/; The command will undergo variable interpolation - so the $user gets converted into a real user name, then the command is passed to the shell, and all output from the shell is passed back to the command and put into the variable $info. The numeric status of the command is stored in the Perl variable $?. If you need to pass a $ symbol to the shell then you’ll need to escape it with \, so the $user in our example is seen by Perl and not the shell. Be careful how you use <>. If you do this: $one_line

= <MYFILE>; # Get one line

@all_lines = <MYFILE>; # Get all lines - are you sure? If you just use <> without a file handle, then STDIN is assumed. So: $input = <STDIN>; and $input = <>; both do the same thing; read a line of input from STDIN. You can use this to advantage with Perl one-liners where STDIN is actually a pipe from a shell command like this (the $ is the shell prompt): $ cat myfile.pl | perl -e "while (<>) { print if m/^\s*sub/; };”

A Special Case Of Using <>  Normally when you use the <> operator, you use it like this: my $line = <STDIN>; # Assign explicitly to a variable

 There is one case where assignment is automatic:  The <> operator is the only thing inside the conditional of a while() loop.  If it is, then the input is assigned to $_.  Used in writing Perl One-Liners. @ARGV = ('-') unless @ARGV; # assume STDIN if empty while (@ARGV) { $ARGV = shift @ARGV; # shorten @ARGV each time if (!open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while () { ... # code for each line } }

while (<>) { ... # code for each line }

This,

Does exactly the same as this.

Remember, this special “magic” requires that the only thing inside the while loop is the <> operator, if you use the <> operator anywhere else you must assign the result explicitly if you want to keep the value. LAB5 - FILES_1 LAB5 - FILES_2 LAB5 - FILES_3

The Range Operator ..  Examples: 1: for (101 .. 200) { print; } 2: @foo = @foo[0 .. $#foo]; 3: @foo = @foo[ -5 .. -1];

# prints 101102...199200 # an expensive no-op # slice last 5 items

4: @alphabet = ('A' .. 'Z'); 5: $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; 6: @z2 = ('01' .. '31');

print $z2[$mday];

7: @combos = ('aa' .. 'zz'); 8: @bigcombos = ('aaaaaa' .. 'zzzzzz');

Ay-Carumba - You’d better have a lot of memory

1: Uses $_ as the default value of the loop. 2: $#foo is the index of the last item in @foo - this is true for all arrays. 3: Using a negative subscript on an array counts backwards from the end of the array. If the left value is greater than the right value in a .. Command then a null list is returned. If what you really wanted was to count backwards then do this: for reverse ( 27 .. 56 ) { print; } # prints 565554 … 2827 4: When used with strings we get some magic - this gives all the uppercase letters in the English alphabet. The .. operator is false as long a its left operand is false. Once the left operand is true the .. operator is true until the right operand is true, then the .. operator becomes false again.

The Conditional Operator ?:  Just like the C version.  Is a trinary operator - it’s two parts separate three expressions like this: condition ? then : else  Examples: $a = $ok ? $b : $c; @a = $ok ? @b : @c; $a = $ok ? @b : @c;

# get a scalar # get an array # get a count of an array's elements

printf "I have %d camel%s.\n", $n, $n == 1 ? "" : "s"; What this says is this (for the first example): Look at the value of $ok - if it’s true then $a = $b; otherwise $a = $c; Example:

$result = ( $count == 10 ) ? 88 : 99;

1st expression

2nd expression

3rd expression

The condition part is always evaluated in scalar context - for Truth or Falsity. Question: In the example - what will the value of $result be if $count is 12?

How Do I … Establish Default Values?  You would like to give a default value to a variable, but only if it doesn’t already have one. $a = $b || $c;

use $b if $b is true, else $c

$x ||= $y

set $x to $y unless $x is already true

$a = defined($b) ? $b : $c;

use $b if $b is defined, else $c

$foo = $bar || "DEFAULT VALUE"; $dir = shift(@ARGV) || "/tmp"; $dir = defined($ARGV[0]) ? shift(@ARGV) : "/tmp"; $dir = @ARGV ? $ARGV[0] : "/tmp"; $count{ $shell || "/bin/sh" }++;

More info: See The Perl Cookbook, section 1.2 Page 6. The difference between the two types of solution is what they test for - something being defined, or something being true. Three values which are defined are false. 0 “0” and “”. If a variable already held one of those values and you wanted to keep that value then || won’t work.

How Do I … Establish Default Values?  You would like to give a default value to a variable, but only if it doesn’t already have one. # find the user name on Unix systems $user = $ENV{USER} || $ENV{LOGNAME} || getlogin() || (getpwuid($<))[0] || "Unknown uid number $<";

The first expression which is true is the result which is assigned to $user.

$starting_point ||= "Greenwich"; @a = @b unless @a; @a = @b ? @b : @c;

copy only if empty assign @b if nonempty, else @c

More info: See The Perl Cookbook, section 1.2 Page 6. LAB5 - FILE_4

Control Structures

Notes:

Control Structures - Truth  We’ve seen that some operators return a true or false value.  Here are the rules for the values a scalar can hold. 1. Any string is true except for “” and “0”. 2. Any number is true except for 0. 3. Any reference is true regardless of what it refers to. 4. Any undefined value is false. 0 # would become the string "0", so false. 1 # would become the string "1", so true. 10 - 10 # 10-10 is 0, would convert to string "0", so false. 0.00 # equals 0, would convert to string "0", so false. "0" # the string "0", so false. "" # a null string, so false. "0.00" # the string "0.00", neither "" nor "0", so true! "0.00" + 0 # the number 0 (coerced by the +), so false. \$a # a reference to $a, so true, even if $a is false. undef() # a function returning the undefined value, so false.

Notes

Loop Statements LABEL while (EXPR) BLOCK LABEL while (EXPR) BLOCK continue BLOCK LABEL until (EXPR) BLOCK LABEL until (EXPR) BLOCK continue BLOCK LABEL for (EXPR; EXPR; EXPR) BLOCK LABEL foreach (LIST) BLOCK LABEL foreach var (LIST) BLOCK LABEL foreach var (LIST) BLOCK continue BLOCK LABEL BLOCK LABEL BLOCK continue BLOCK

Continue BLOCKS are always optional

LABEL’s are always optional

All these statements have an optional LABEL. The while statements execute as long as EXPR is true. If while is replaced with until, then the sense of the test is reversed. Note that unlike some languages which have do - until loops, in Perl the until test is made at the start of the loop and not the end. It is customary to make the LABEL name be all uppercase. The while and until statement can have an optional continue block. This block is executed every time the block is continued either by falling off the end of the first block or by an explicit next (a loop-control operator which goes to the next iteration of the loop).

Loop Control  We’ve already seen that a loop can have a label.  It’s used with the loop control operators next, last, redo.  The label names the loop as a whole - not the top of the loop.  The loop control operator doesn’t “go to” the label.  The syntax for the loop control operators is this:  last LABEL  next LABEL  redo LABEL  The last operator immediately exits the loop - any continue block is not executed.  The next operator skips the rest of the current loop and starts the next one. If there’s a continue clause then it is executed.  The redo operator restarts the loop block without evaluating the condition again. Any continue block is not executed.

The LABEL is optional - if it’s missing then the last, next, redo is the innermost enclosing loop. But if you want to jump out of nested loops then the LABEL is needed. Even though I’ve talked about continue blocks a lot - not many people use them.

Loop Control - An Example LABEL: while { # Code if ( something == TRUE ) { redo; } # Code if ( something == TRUE ) { next; } # Code if ( something == TRUE ) { last; } # Code } continue { # Code }

The LABEL is optional - if it’s missing then the last, next, redo is the innermost enclosing loop. But if you want to jump out of nested loops then the LABEL is needed.

Compound Statements - If And Unless  A sequence of statements is called a BLOCK.  Compound statements are built from expressions and BLOCKs.  Blocks are always surrounded by { and }. if (EXPR) BLOCK if (EXPR) BLOCK else BLOCK if (EXPR) BLOCK elsif (EXPR) BLOCK .. if (EXPR) BLOCK elsif (EXPR) BLOCK .. else BLOCK

unless (EXPR) BLOCK unless (EXPR) BLOCK else BLOCK unless (EXPR) BLOCK elsif (EXPR) BLOCK .. unless (EXPR) BLOCK elsif (EXPR) BLOCK .. else BLOCK

Note: it’s elsif NOT elseif. unless simply reverses the true/false value of if. Note that unless also works with else and elsif. There’s no such thing as elseunless.

Compound Statements - If And Unless  Examples: unless ($x == 1) ... if ($x != 1) ... if (!($x == 1)) ...

}

These all do the same thing. TMTOWTDI

if ((my $colour = <STDIN>) =~ /red/i) { $value = 0xff0000; } elsif ($colour =~ /green/i) { $value = 0x00ff00; } elsif ($colour =~ /blue/i) { $value = 0x0000ff; } else { warn "unknown RGB component $colour, using black instead\n"; $value = 0x000000; }

Notes:

Compound Statements - If And Unless  Examples: unless (open(FOO, $foo)) if (!open(FOO, $foo))

{ die "Can't open $foo: $!" } { die "Can't open $foo: $!" }

die "Can't open $foo: $!" die "Can't open $foo: $!"

unless open(FOO, $foo); if !open(FOO, $foo);

open(FOO, $foo) open FOO, $foo

|| die "Can't open $foo: $!"; or die "Can't open $foo: $!";

chdir $dir open FOO, $file @lines = close FOO

or or or or

die die die die

"chdir $dir: $!"; "open $file: $!"; "$file is empty?"; "close $file: $!"; $! is the error code

I tend to prefer this

In the preferred example - there’s no if and no unless - we’re relying on the shortcircuit evaluation. $! Is the error code returned by a shell for open, chdir and close (and also for lots of other shell operations).

Control Structures - If And Unless  Examples: if ($debug_level > 0) { # Something has gone wrong. Tell the user. print "Debug: Danger, Will Robinson, danger!\n"; }

if ($city eq "New York") { print "New York is northeast of Washington, D.C.\n"; } elsif ($city eq "Chicago") { print "Chicago is northwest of Washington, D.C.\n"; } else { print "I don't know where $city is, sorry.\n"; }

unless ($destination eq $home) { print "I'm not going home.\n"; }

Note - if has else and elsif. unless does not have an elseunless.

Control Structures - If And Unless  More examples - compare with the previous page: print "Danger, Will Robinson, danger!\n" if ($debug_level > 0); print "I'm not going home.\n" unless ( $destination eq $home );

Another example of idiomatic Perl. You’ll see the interchangeability of statements like this a lot.

Control Structures - While And Until  Perl has four main looping constructs, while & until and for & foreach.  While & until act like if and unless except that they loop repeatedly. 1. First the condition is checked. 2. If the condition is met, that is the condition is: 1. 2.

True for the while loop. False for an until loop.

3. Then the block of code is executed. while ($tickets_sold < 10000) { $available = 10000 - $tickets_sold; print "$available tickets are available. $purchase = <STDIN>; chomp($purchase); $tickets_sold += $purchase; }

How many would you like: ";

while ( $line = ) { ...

Note: If the original condition is never met then the loop is never entered. Make sure if you intend to leave the loop at some point that you have some code in the loop which changes the variable which keeps you going through the loop. The bottom example assigns the next line from the GRADES file to the variable $line and returns the value of the line so the condition of the while statement can be evaluated for truth. You might wonder if Perl will exit prematurely when it sees blank lines in the file - the answer is it won’t because a blank line is a “\n” or newline character and this is not false. When we do reach the end of the file the line input operator returns the value undef, which always evaluates to false and so at this point the loop does terminate. There’s no need for an explicit test because the input operator is set up to work smoothly in a conditional context.

While Loops

while (my $line = <STDIN>) { $line = lc $line; } continue { print $line; # still visible } # $line now out of scope here

A variable declared local to the while loop (here done with my $line) exists only inside the loop. If you want $line to be visible after the loop has ended then declare the variable before the loop begins. We’ll discuss scope shortly. Also, the use of a continue block here is redundant - we could have easily put all the statements in the continue block inside the main while loop. We’ll also discuss last,next and redo shortly.

Control Structures - While And Until  You will often see command line arguments processed like this: while (@ARGV) { process(shift @ARGV); }

The shift operator removes one element from the argument list each time through the loop and sends it to a subroutine for processing (here called process()).

Control Structures - For And Foreach  Examples: for ($sold = 0; $sold < 10000; $sold += $purchase) { $available = 10000 - $sold; print "$available tickets are available. How many would you like: "; $purchase = <STDIN>; chomp($purchase); }

foreach $user (@users) { if (-f "$home{$user}/.nexrc") { print "$user is cool... they use a perl-aware vi!\n"; } }

foreach $key (sort keys %hash) {...

Common Perl idiom for getting the keys from a hash.

The for loop takes three expressions. An initial expression - set only once, a condition to be tested every time the loop is executed and an expression to modify the loop variable. The foreach loop is used to iterate through the contents of an array. The foreach loop treats the expression in ( and ) as a list (this is list context) always - even if there’s only one element in the list. Then each element is aliased to the loop variable in turn - IMPORTANT - MODIFYING THE LOOP VARIABLE ALSO MODIFIES THE ORIGINAL ARRAY.

For Loops  The for loop has three expressions: 1. An expression which initializes the loop. 2. A condition which will keep the loop executing, and 3. An expression which re-initializes the loop.  All three expressions are optional - the “;” are not.  If it’s missing - the condition is always true.  So: LABEL: for (my $i = 1; $i <= 10; $i++) { } { my $i = 1; LABEL: while ($i <= 10) { } continue { $i++; } }

Notes:

These are equivalent

For Loop Examples  Examples: for ($i = 0, $bit = 0; $i < 32; $i++, $bit <<= 1) { print "Bit $i is set\n" if $mask & $bit; } # the values in $i and $bit persist past the loop

for (my ($i, $bit) = (0, 1); $i < 32; $i++, $bit <<= 1) { print "Bit $i is set\n" if $mask & $bit; } # loop's versions of $i and $bit now out of scope

You can do more than one thing in the three parts of the loop. The <<= 1 part of the loop is shifting the value of $bit 1 bit to the right.

Foreach Examples  Examples: $sum = 0; foreach $value (@array) { $sum += $value } for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') { print "$count\n"; sleep(1); }

# do a countdown

for (reverse 'BOOM', 1 .. 10) { print "$_\n"; sleep(1); }

# same thing

for $field (split /:/, $data) { print "Field contains: `$field'\n"; }

# any LIST expression

foreach $key (sort keys %hash) { print "$key => $hash{$key}\n"; }

This is the usual way to get all of the keys out of a hash.

With foreach there isn’t any way to know where you are in a list (unless you decide to keep track of it yourself with counters etc.) If the list contains modifiable values (i.e. variables, not constants), then you can modify those variables by modifying the variable inside the loop. The variable in the loop is an alias for the variable in the list.

Foreach Examples  Examples: foreach $pay (@salaries) { $pay *= 1.50; }

# grant 50% raises # works for me!

for (@christmas, @easter) { s/ham/turkey/; }

# change menu

s/ham/turkey/ for @christmas, @easter;

# same thing

for ($scalar, @array, values %hash) { s/^\s+//; s/\s+$//; }

# strip leading whitespace # strip trailing whitespace

On the last slide we said that the variable inside the loop in a foreach loop was an implicit alias for the variable in the list which is passed to foreach. So when we alter the variable in the loop ($pay in the top example) we’re actually altering the variable in the list which we are reading through.

Control Structures - Breaking Out - Next & Last  It’s not unusual to have special cases in loops.  Next skips to the end of the loop and forces the next iteration.  Last skips to the end of the loop and exits the loop.  Example: foreach $user (@users) { if ($user eq "root" or $user eq "lp") { next; } if ($user eq "special") { print "Found the special account.\n"; # do some processing last; } }

Notes:

Control Structures - Breaking Out - Next & Last  It’s possible to break out of nested loops by labeling your loops and specifying which loop you want to break out of. LINE: while ($line =
) { last LINE if $line eq "\n"; # stop on first blank line next LINE if $line =~ /^#/; # skip comment lines # your ad here }

A label Would anyone care to speculate On what this piece of code does?

Notes:

Case Statements  Perl doesn’t have a case statement:.

But it’s simple to build one.

SWITCH: { if (/^abc/) { $abc = 1; last SWITCH; } if (/^def/) { $def = 1; last SWITCH; } if (/^xyz/) { $xyz = 1; last SWITCH; } $nothing = 1; } OR SWITCH: { /^abc/ && do { $abc = 1; last SWITCH; }; /^def/ && do { $def = 1; last SWITCH; }; /^xyz/ && do { $xyz = 1; last SWITCH; }; $nothing = 1; }

Perl doesn’t have a case/switch structure since it is so easy to build one. The SWITCH is a label (remember the convention that all labels are in upper-case), and not some Perl keyword we haven’t discussed yet. We haven’t covered do (it’s on the next page), but think of it as a dummy keyword which enables a statement (the bit between { and }) to be written. All three lines in the second statement are using short-circuit evaluation. The first thing on the line (reading from left to right) which is false makes the whole line false and all the statements following are not evaluated. Remember: in short-circuit evaluation it’s the first thing which is false in an && statement and the first thing which is true in an || statement which controls the flow of the program. It’s important to remember that once a short-circuit evaluation has enough information to determine truth/falsity, then none of the other possible clauses are evaluated. If those other clauses also do assignment then those assignments won’t happen.

The do (BLOCK) Construct # process to place all LFSR stage results in a single file while() { /LFSR\s\=\s(\w+)/ && do { print LFSRFILE “$1\n” }; $lastfile = $1; }

This is a way of grouping a lot of statements into a single block.

The do BLOCK executes a sequence of statements in the BLOCK and returns the value of the last expression evaluated in the BLOCK. It can be modified with a while or an until statement modifier. If so then Perl executes the BLOCK before it tests the loop condition. The do BLOCK itself does not count as a loop, so the loop control statements next, last, redo cannot be used to leave or restart the BLOCK.

The do (FILE) Construct If do can read the file but can’t compile it, it returns undef and sets an error message in $@.

# read in config files: system first, then user for $file ("/design/C6RAM/defaults/defaults.rc", "$ENV{HOME}/.someprogrc") { unless ($return = do $file) { warn "couldn't parse $file: $@" if $@; warn "couldn't do $file: $!" unless defined $return; warn "couldn't run $file" unless $return; } } If the file compiles and runs, the value returned is the value of the last expression evaluated.

If do can’t read the file it returns undef and sets $! to the error.

The do FILE form uses the value of FILE as a filename and executes the contents of the file as a Perl script. Its use is to include subroutines from a Perl subroutine library, but it has been superceded by use. It is still useful for loading things like configuration data into your program as shown in the example. If the file can be read but doesn’t compile then an error is set in $@. If the file can’t be read then an error is set in $!

Goto  Perl does support goto - so that’s at least one thing they got wrong then!  You can:  goto LABEL  goto Expression  goto &name (subroutine)

Notes:

goto(("FOO", "BAR", "GLARCH")[$i]);

# hope 0 <= i < 3

@loop_label = qw/FOO BAR GLARCH/; goto $loop_label[rand @loop_label];

# random teleport

How Do I … Do Something With Every Element In A List?  You want to repeat a procedure for every element in a list. foreach $item (LIST) { # do something with $item }

Solution: Use a foreach loop

foreach $user (@bad_users) { complain($user); } foreach $var (sort keys %ENV) { print "$var=$ENV{$var}\n"; }

Sometimes you need to use a function to generate the list needed by foreach

foreach $user (@all_users) { $disk_space = get_usage($user); if ($disk_space > $MAX_QUOTA) { complain($user); } }

The code in the loop can call last to jump out of the loop, next to move on to the next element, of redo to jump back to the first statement inside the block.

More info: See The Perl Cookbook, section 4.4 Page 97. The variable set to each value in the list is called the loop iterator. If no variable is supplied then the global variable $_ will be used. $_ is the default variable used in many of Perl’s string, list and file functions.

How Do I … Do Something With Every Element In A List?  You want to repeat a procedure for every element in a list. while () { chomp; foreach (split) { $_ = reverse; print;

# # # # # #

$_ is set to the line just read $_ has a trailing \n removed, if it had one $_ is split on whitespace, into @_ then $_ is set to each chunk in turn the characters in $_ are reversed $_ is printed

} }

foreach my $item (@array) { print "i = $item\n"; } @array = (1,2,3); foreach $item (@array) { $item--; } print "@array"; # prints: 1 2 3

To be sure of what is happening it is always Perl’s $_ value is preserved better to declare and through any foreach nested use your own lexical loops variable The foreach construct has another feature: each time through the loop the iterator variable is an alias not a copy

More info: See The Perl Cookbook, section 4.4 Page 97. IMPORTANT NOTE: The top example works the way we might hope for. The value of $_ in the while loop is preserved when the foreach loop is executed. However, if the while loop had been the inner loop then BAD THINGS would have happened since the while construct clobbers the value of the global $_ (I.e. it doesn’t localize it). Consider this to be a bug or a feature - either way it’s an accident waiting to happen. See the full explanation on page 99 of the Perl Cookbook. I would always recommend using lexical variables. These are localized at their point of declaration and the risk of side-effects is much reduced. Also note that with a foreach loop, the loop iterator is not a copy of the variable from the list, it actually is the variable in the list - change the variable and it changes in the list. This is important - it’s not a copy, it’s an alias.

How Do I … Find Elements In One List But Not In Another?  You want to find the elements which are in one list but not in another. # assume @A and @B are already loaded %seen = (); # lookup table to test membership of B @aonly = (); # answer # build lookup table foreach $item (@B) { $seen{$item} = 1 } # find only elements in @A and not in @B foreach $item (@A) { unless ($seen{$item}) { # it's not in %seen, so add to @aonly push(@aonly, $item); } }

Straight-forward version

More info: See The Perl Cookbook, section 4.7 Page 104. Solution: Build a hash of the keys in @B to use as a lookup table. Then iterate through @A looking to see if the item in @A is in the lookup table. If it is then it’s in both @A and @B. If it’s not then it’s in @B but not in @A.

How Do I … Find Elements In One List But Not In Another?  You want to find the elements which are in one list but not in another. my %seen; my @aonly;

# lookup table # answer

# build lookup table @seen{@B} = (); foreach $item (@A) { push(@aonly, $item) unless exists $seen{$item}; }

Different (idiomatic) version

More info: See The Perl Cookbook, section 4.7 Page 104. The two different answers vary in how they build the hash. The first (previous slide) iterates over @B. This one uses a hash slice. A hash slice is built like this: $hash{“key1”} = 1; $hash{“key2”} = 2; This is equivalent to: @hash{“key1” , “key2”} = (1,2); The list in {} holds the keys while the list on the right holds the values. In this second example we say this: @seen{@B} = (); This uses the items in @B as keys for %seen, setting each to undef (because the list on the right is empty). We later check for the existence of the key - not the logical truth or the definedness of the value.

How Do I … Extract Unique Elements From A List?  You want to remove duplicate elements from a list. %seen = (); @uniq = (); foreach $item (@list) { unless ($seen{$item}) # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }

Solution: Use a hash to record the values and then keys() to extract the values

%seen = (); foreach $item (@list) { push(@uniq, $item) unless $seen{$item}++; }

Same as above but faster

%seen = (); foreach $item (@list) { $seen{$item}++; } @uniq = keys %seen;

Same as above but different

More info: See The Perl Cookbook, section 4.6 Page 102. Solution: Use a hash to record which items have been seen and then use keys on the hash to extract them. Warning. Using a hash like this can use up a lot of memory, and once you’ve used a hash the keys function will return the keys in a random order (not the insertion order). If this matters then you need a different solution.

How Do I … Extract Unique Elements From A List?  You want to remove duplicate elements from a list. # generate a list of users logged in, removing duplicates %ucnt = (); for (`who`) { s/\s.*\n//; # kill from first space till end-of-line, yielding username $ucnt{$_}++; # record the presence of this user } # extract and print unique keys @users = sort keys %ucnt; print "users logged in: @users\n";

More info: See The Perl Cookbook, section 4.6 Page 102.

How Do I … Reverse An Array?  You want to reverse an array. # reverse @ARRAY into @REVERSED @REVERSED = reverse @ARRAY; for ($i = $#ARRAY; $i >= 0; $i--) { # do something with $ARRAY[$i] }

Solution: Use the reverse() function Solution: Use a for loop

# two-step: sort then reverse @ascending = sort { $a cmp $b } @users; @descending = reverse @ascending; # one-step: sort with reverse comparison @descending = sort { $b cmp $a } @users;

More info: See The Perl Cookbook, section 4.10 Page 109. The reverse() function, reverses a list. The for loop actually processes the list in reverse order but keep the list in its original order. If you use reverse() to reverse a list you just sorted then make sure its in the order you want. The sort() function takes an optional code block which lets you replace the default alphabetic comparison subroutine with your own, This function is called each time sort() has to compare two values. The values are loaded into $a and $b which are automatically localised, so they won’t interfere with any variables you already have called $a or $b. The comparison function should return a negative number if $a should appear before $b in the output list, 0 if the order doesn’t matter and a positive number if $a should appear after $b in the output list. Perl has two operators that behave this way: <=> for sorting numbers in ascending order, and cmp for sorting strings in ascending alphabetic order. By default sort() uses cmp-style comparisons. Of course, you can always provide your own comparison subroutine.

How Do I … Traverse A Hash?  You want to perform an action on each entry in a hash. while(($food, $color) = each(%food_color)) { print "$food is $color.\n"; } Banana is yellow. Apple is red. Carrot is orange. Lemon is yellow.

Solution: Use each() with a while loop

foreach $food (keys %food_color) { my $color = $food_color{$food}; print "$food is $color.\n"; } Banana is yellow. Apple is red. Carrot is orange. Lemon is yellow.

Solution: Use keys with a foreach loop

foreach cannot be used with hashes, nor can push(), pop(), shift, unshift()

More info: See The Perl Cookbook, section 5.4 Page 135.

WARNING

How Do I … Delete Something From A Hash?  You want to remove an entry from a hash. # remove $KEY and its value from %HASH delete($HASH{$KEY});

Solution: Use the delete() function

Don’t try to delete a key by setting its value to undef. All that will do is set the keys value to undef! The delete function() is the only way to remove a specific hash entry. Once a key is deleted it will no longer show up in the list of keys(), or an each() iteration and exists() will return false for that key.

More info: See The Perl Cookbook, section 5.3 Page 133. You can’t delete a key by setting its value to undef since undef is a value which a hash can can store. You must use the delete() function. If you want to clear a hash then simply assign it to the empty list like this: %hash = ();

How Do I … Sort A Hash?  You need to work with the elements of a hash in a particular order. # %HASH is the hash to sort @keys = sort { criterion() } (keys %hash); foreach $key (@keys) { $value = $hash{$key}; # do something with $key, $value }

Solution

foreach $food (sort keys %food_color) { print "$food is $food_color{$food}.\n"; }

Alphabetically

foreach $food ( sort { $food_color{$a} cmp $food_color{$b} } keys %food_color; { print "$food is $food_color{$food}.\n"; }

Associated values

More info: See The Perl Cookbook, section 5.9 Page 144. Solution: Get a list of keys and sort based on the ordering you want. Sort by default sorts alphabetically. The optional code block passed to sort will be called every time sort needs to compare two values in the sort function. $a and $b are localised sort variables.

How Do I … Test For The Presence Of A Key In A Hash?  You need to know if a hash has a particular key. %age = (); $age{"Toddler"} = 3; $age{"Unborn"} = 0; $age{"Phantasm"} = undef; foreach $thing ("Toddler", "Unborn", "Phantasm", "Relic") { print "$thing: "; print "Exists " if exists $age{$thing}; print "Defined " if defined $age{$thing}; print "True " if $age{$thing}; print "\n"; } Toddler: Exists Defined True Unborn: Exists Defined Phantasm: Exists Relic:

Exists, defined, true Exists, defined Exists None of the above

More info: See The Perl Cookbook, section 5.2 Page 131. Toddler: It exists because we gave it a value in the hash, that value is defined (3) and since it’s non-zero, it is true. Unborn: It exists because we gave it a value in the hash, that value is defined (0) and since it’s zero it is not true. Phantasm: It exists because we gave it a value in the hash, that value is undefined so it fails the defined test and since undef is false it fails the truth test as well. Relic: It doesn’t exist since we never put it into the hash. So it fails all three tests.

How Do I … Invert A Hash?  You have a hash and a value for which you want to find the corresponding key. # %LOOKUP maps keys to values %REVERSE = reverse %LOOKUP;

Solution: Use the reverse() function

%surname = ( "Mickey" => "Mantle", "Babe" => "Ruth" ); %first_name = reverse %surname; print $first_name{"Mantle"}, "\n"; Mickey

What happens if two different keys happen to have the same value? Result - The inverted hash will only have one. For a solution to this see the “Perl Cookbook” pages 140 and 141.

More info: See The Perl Cookbook, section 5.8 Page 142. Use reverse() to create an inverted hash whose values are the original hashes keys and whose keys are the original hashes values. When we treat %surname as a list it becomes: ("Mickey", "Mantle", "Babe", "Ruth"), or ("Ruth", "Babe", "Mantle", "Mickey"), because we can’t predict the order in which things come out of hashes. Reversing this list (assume the first list is the one we get) gives this: ("Ruth", "Babe", "Mantle", "Mickey") When we treat this list as a hash it becomes: ("Ruth" => "Babe", "Mantle" => "Mickey")

How Do I … Test For The Presence Of A Key In A Hash?  You need to know if a hash has a particular key. # does %HASH have a value for $KEY ? if (exists($HASH{$KEY})) { # it exists } else { # it doesn't }

Solution: Use the exists() function

# %food_color per the introduction foreach $name ("Banana", "Martini") { if (exists $food_color{$name}) { print "$name is a food.\n"; } else { print "$name is a drink.\n"; } } Banana is a food. Martini is a drink.

More info: See The Perl Cookbook, section 5.2 Page 131. exists() checks for the existence of a key in a hash. It doesn’t say anything about the keys value (if the key exists).

How Do I … Print A Hash?  You want to print a hash, but neither print “%hash” nor print %hash works. while ( ($k,$v) = each %hash ) { print "$k => $v\n"; }

Solution: Iterate using each()

print map { "$_ => $hash{$_}\n" } keys %hash;

Solution: Use map to generate a list of strings

print "@{[ %hash ]}\n";

Solution: Interpolate the hash as a list and print that

{

Solution: Use a temporary array to hold the hash and print that

my @temp = %hash; print "@temp"; } foreach $k (sort keys %hash) { print "$k => $hash{$k}\n"; }

You can print in key order at the cost of doing a sort()

More info: See The Perl Cookbook, section 5.5 Page 137. The best solution is probably the first one.

How Do I … Delete Something From A Hash?  You want to remove an entry from a hash. # %food_colour as per Introduction sub print_foods { my @foods = keys %food_colour; my $food; print "Keys: @foods\n"; print "Values: ";

print "Initially:\n"; print_foods(); print "\nWith Banana undef\n"; undef $food_colour{"Banana"}; print_foods(); print "\nWith Banana deleted\n"; delete $food_colour{"Banana"}; print_foods();

foreach $food (@foods) { my $color = $food_colour{$food}; if (defined $colour) { print "$colour "; } else { print "(undef) "; } } print "\n"; }

Initially: Keys: Banana Apple Carrot Lemon Values: yellow red orange yellow With Banana undef Keys: Banana Apple Carrot Lemon Values: (undef) red orange yellow With Banana deleted Keys: Apple Carrot Lemon Values: red orange yellow

More info: See The Perl Cookbook, section 5.3 Page 133. You can’t delete a key by setting its value to undef since undef is a value which a hash can store. You must use the delete() function. As the example shows, setting $food_colour{“Banana”} to undef doesn’t delete the key from the hash - it only makes the value undef. delete() really does remove it from the hash. delete() can also work with a hash slice to remove multiple keys from a hash, like this: delete @food_color{"Banana", "Apple", "Cabbage"};

How Do I …Merge Hashes?  You need to make a new hash with the entries of two existing hashes. %merged = (%A, %B); %merged = (); while ( ($k,$v) $merged{$k} } while ( ($k,$v) $merged{$k} }

= each(%A) ) { = $v;

Solution: Treat the hashes as lists and join them as you would lists. Keys which appear in both hashes will only appear once in the final hash. Alternative: Loop over the hashes elements and build a new hash.

= each(%B) ) { = $v;

More info: See The Perl Cookbook, section 5.10 Page 145.

How Do I … Traverse A Hash?  You want to perform an action on each entry in a hash. while(($key, $value) = each(%HASH)) { # do something with $key and $value }

Solution: Use each() with a while loop

foreach $key (keys %HASH) { $value = $HASH{$key}; # do something with $key and $value }

Solution: Use keys with a foreach loop

More info: See The Perl Cookbook, section 5.4 Page 135. The each() function returns a two element list from the hash each! time it is called. Remember, order has no meaning in hashes, so regardless of the order with which you put values into the hash, it is very unlikely that they will come back out in that same order. It is possible to retrieve items in insertion order, but that is beyond the scope of this course.

How Do I … Find The Most Common Anything?  You want to know how many times a value in an array or in a hash occurs in the array or hash. %count = (); foreach $element (@ARRAY) { $count{$element}++; }

Solution: Use a hash to count how many time each element (for an array) or key (for a hash) occurs. The foreach adds one to $count{$element} for every occurrence of $element.

More info: See The Perl Cookbook, section 5.14 Page 150.

How Do I … Operate On A Series Of Integers?  You want to perform an operation on a series of integers between X and Y. foreach ($X .. $Y) { # $_ is set to every integer from X to Y, inclusive }

Range operator

foreach $i ($X .. $Y) { # $i is set to every integer from X to Y, inclusive }

Range operator

for ($i = $X; $i <= $Y; $i++) { # $i is set to every integer from X to Y, inclusive } for ($i = $X; $i <= $Y; $i += 7) { # $i is set to every integer from X to Y, stepsize = 7 }

Remember, for and foreach are synonyms, so that gives us another 4 variations

More info: See The Perl Cookbook, section 2.5 Page 49. Solution: use a for loop or a foreach with the range operator (..) When iterating over consecutive integers, the third method is most efficient.

Regular Expressions

Notes:

Regular Expressions  Regular expressions (a.k.a. regexes, regexps, RE’s) are used in:  grep  awk  findstr  sed  vi  Emacs  A regular expression is a way of describing a set of strings without saying what they all are. if (/Windows 95/) { print "Time to upgrade?\n" }

s/Windows/Linux/;

Be careful - regular expressions in Perl are not identical to regular expressions in other languages. When you see something that looks like /foo/ you’re looking at a pattern match operator (the / and the /). If you can find patterns in a string then you can also replace those patterns with something else. So when you see something like s/Windows/Linux/ you’re looking at a substitution of Linux for Windows (which some people might say is a good thing)! Finally patterns can also specify where something isn’t. This is used with the split operator - see next slide.

Regular Expressions  An example of the split operator: ($good, $bad, $ugly) = split( /,/ , "vi,emacs,teco");

This is the list which gets the results of the split operator

This is the string which split uses to chop up the list on its right (the comma between / and /

This is the text which split operates on

 Tip - the best way to split a string which contains lots of white space: @words = split( /\s+/ , $line );

We haven’t covered the \s character class yet - but it stands for any white-space character. The \s+ means any string containing one or more consecutive white-space characters (it can be different numbers at different places on a line of text - the fields on which the split occurs don’t all have to be the same length).

Regular Expressions  The simplest regular expressions are those which match several characters in a row: while ($line = ) { if ($line =~ /http:/) { print $line; } } This uses $_ for both the while () { print if /http:/; }

input operator and the string to search for a pattern match

while () { print if /http:/; print if /ftp:/; print if /mailto:/; # What next? }

In the first example we’re looking for all lines containing /http:/ exactly. The =~ operator is called the binding operator. It’s telling Perl to look for a match in the variable $line. If we don’t use the =~ operator then Perl by default searches the system variable $_. This is a special scalar variable which is used in many places in Perl - not just pattern matching. In the second example we’re using the default value $_ (which is also set by the <> operator). In the third example we’re looking for lots of different types of links, http, ftp, mailto. What happens if this later needs to be extended. Wouldn’t it be easier to look for any number of alphabetic characters followed by a colon?

Regular Expressions  In regular expression speak that would be: /[a-zA-Z]+:/

The [ and ] define a character class. The a-z and A-Z represent all the alphabetic characters (the - means all characters between the starting and ending character inclusive). The + means “one or more of whatever is immediately in front of me”. That’s an example of a quantifier - something which says how many times something is allowed to repeat. Remember the / and / are not part of the pattern. Thery’re like quotes in that they contain the pattern but are not part of it.

Regular Expressions - Character Classes  These are some common Perl quantifiers. Name

ASCII definition

Code

Whitespace

[ \t\n\r\f]

\s

Word character

[a-zA-Z_0-9]

\w

Digit

[0-9]

\d

 Note that these match single characters.  A \w will match a single word character - not a word.  You can say \w+ to match a word.  Perl also allows negation of these classes by using upper case character version of a quantifier.  \D matches a non-digit character etc.  There’s one special character class, written with a “.” that will match any character.

Example: /a./ will match any string containing an “a” that is not the last character in a string. Why? So this will match “at” or “am” or “a!” but not “a” since there’s nothing after the “a” for the dot (any character) to match with. It’ll also match “camel” and “oasis”, but not “sheba”. It matches “caravan” on the first “a”.

Regular Expressions - Quantifiers  The character classes we’ve seen so far all match one character.  You can match a word with \w+ and the “+” is one kind of quantifier.  General quantifiers are like this:  {min,max} Example

Matches

\d{6,8}

Any number of between 6 and 8 digits

\d{5,5}

A number of exactly 5 digits

\d{5,}

A number of 5 digits or more

\d{,5}

A number of 5 digits or less

Code

Meaning

+

{1,}

*

{0,}

?

{0,1}

Be very, very careful using “*”. Why?

Regular Expressions - Quantifiers  Exercise: What does this do, i.e. what will be in $line after the substitution? $line = "Fred xxxxxxxx barney"; $line =~ s/x*//; print $line;

 One last thing:  Quantifiers apply to the immediately preceding character, so: /bam{2}/ will match "bamm" but not "bambam"

 To apply a quantifier to more than one character, use ( and ) like this: /(bam){2}/ will match "bambam"

One other thing to note: all matching in Perl is greedy - Perl will match as much as it can

Regular Expressions - Anchors  Examples: /\bFred\b/ would match in

Answer And Reason

"The Great Fred"

Yes

"Fred The Great"

Yes

"Frederick The Great"

No - Fred is not followed by a non-word character.

 There are also characters for matching at:  Start of line “^”.  End of line “$”. (Don’t worry, Perl won’t confuse this with a variable instance).  So when we said: next LINE if line =~ /^#/;

 What were we saying?

When you try to pattern match, Perl will try to match in every location until it succeeds. An anchor allows you to specify where a pattern can match. The special symbol \b matches on a word boundary which is defined as the “nothing” which exists between a word character “\w” and a non-word character “\W”. Answer: Go to the next iteration of the loop if the first character on a line is the “#” character. Also, when we said that the sequence \d{6,8} would match a number of between 6 and 8 digits - that wasn’t quite true, since it would also match any number containing 9 or more digits as well. To get the desired result we would have to combine quantifiers with anchors. Exercise: write a pattern which will match a number of 5 or 6 digits - but will fail to match one of more than 6 digits.

Regular Expressions - Back References  Use ( and ) to remember bits of patterns which match.  Example:

/\d+/ Both these patterns match the same thing - a number

/(\d+)/

But this one remembers what was matched

 What does this do? s/(\S+)\s+(\S+)/$2 $1/

When you match patterns you can use “(“ and “)” to remember the bits of a string which did match. The “(“ and “)” don’t change what matches. How you remember what was matched depends on where you want to remember it from. Inside the same pattern the bits of pattern which match are stored in variables \1 \2 \3 etc. The match from the first pair of “(“ and “)” is in \1 and so on. Outside the pattern the bits of pattern which match are stored in $1 $2 $3 etc. Be careful - once you start a new pattern match the old values of $1 $2 $3 etc. are all wiped out, so if you want to remember them long-term then copy $1 $2 $3 etc. into new variables. By the way - there’s no limit to how many bits of the pattern can be remembered, once you get to \9 or $9 Perl continues with \10 and $10 and so on. Whoops - no easy answer here this time - you’ll have to work it out.

Regular Expressions - List Processing  Examples: @array = (1 + 2, 3 - 4, 5 * 6, 7 / 8);

sort @dudes, @chicks, other();

print reverse sort map {lc} keys %hash;

($hour, $min, $sec, $ampm) = /(\d+):(\d+):(\d+) *(\w+)/;

@hmsa = /(\d+):(\d+):(\d+) *(\w+)/;

Earlier we mentioned the terms scalar and array context. So far most things have been in scalar context - we’ve seen single results. Lots of Perl operators can produce either scalar results or list results. It depends on how they are used. They just “know” what is expected of them. In the first example @array is a four element list. In the second example each of @dudes, @chicks and other() returns a list, all the lists are then joined together to produce a single (big) list and that is passed to sort(). Some operators produce lists (like keys), while some consume them (like print). You can stack several up several list operators in a row - see example 3. This takes all the keys from %hash, turns them all into lower-case by applying the lc operator (via map { }), passes that list to the sort function and then passes that list to the reverse function which then (finally) prints that list. If you do a pattern match in list context then all the back-references are pulled out as a list - see example 4 and example 5. TMTOWTDI.

How Do I … Parse Comma-Separated Data?  You have a file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them. sub parse_csv { my $text = shift; # record containing comma-separated values my @new = (); push(@new, $+) while $text =~ m{ # the first part groups the phrase inside the quotes. # see explanation of this pattern in MRE "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push(@new, undef) if substr($text, -1,1) eq ','; return @new; # list of values that were comma-separated }

This procedure is from “Mastering Regular Expressions”

use Text::ParseWords; sub parse_csv { return quoteword(",",0, $_[0]; }

Use the standard ParseWords module

More info: See The Perl Cookbook, section 1.15 Page 31. Comma-separated data sounds simple to parse, but it is actually a complex format since the fields themselves can contain commas. This makes the pattern matching solution complex and rules out a simple split /,/. Text::ParseWords hides all this complexity from you. Pass its quoteword() function two arguments and a CSV string. The first argument is the separator (in this case a comma); the second is a value which is true or false, and which controls whether the strings returned have quotes around them.

How Do I … Check If A String Is A Valid Number?  You want to check if a string contains a valid number.

if ($string =~ /PATTERN/) { # is a number } else { # is not } warn warn warn warn warn warn warn

General solution Specific solutions

"has nondigits" if /\D/; "not a natural number" unless /^\d+$/; # rejects -3 "not an integer" unless /^-?\d+$/; # rejects +3 "not an integer" unless /^[+-]?\d+$/; "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;

More info: See The Perl Cookbook, section 2.1 Page 44. This is something which is common when validating input as part of a CGI script. The solution is easy as long as you can decide what you mean by a number, and can then write a regular expression (or series of expressions) to look for the pattern you desire. If numbers can have leading or trailing space then a substitution to remove that space should occur, like this: $probable_number = s/\s+//g;

How Do I … Copy And Substitute Simultaneously?  You want a easy way in pattern matching of copying and substituting at the same time. $dst = $src; $dst =~ s/this/that/;

You want to avoid this

($dst = $src) =~ s/this/that/;

So do this

# Make All Words Title-Cased ($capword = $word) =~ s/(\w+)/\u\L$1/g; # /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1 ($catpage = $manpage) =~ s/man(?=\d)/cat/; ($a = $b) =~ s/x/y/g; # copy $a and then change $b $a = ($b =~ s/x/y/g); # change $b, count goes in $a

More info: See The Perl Cookbook, section 6.1 Page 164.

How Do I … Match Only Letters When Pattern Matching?  You want to see whether a value consists on only alphabetic characters. if ($var =~ /^[A-Za-z]+$/) { # it is purely alphabetic }

Use this if you don’t care about locale

use locale; if ($var =~ /^[^\W\d_]+$/) { print "var is purely alphabetic\n"; }

Use this if you do care about locale

More info: See The Perl Cookbook, section 6.2 Page 165. The obvious way of doing this isn’t good enough in the general case since it doesn’t respect a users locale setting. If you need to match letters with diacritical marks, then use something like the second example which matches against a negated character class. The \w regular expression matches one alphabetic character, one numeric character or _. Therefore \W is not one of those. The negated character class [^\W\d_] specifies a byte which must not be alphanumeric, a digit, or an underscore. That leaves nothing but alphabetics.

How Do I … Match Only Words When Pattern Matching?  You want to pick out words from a string. /\S+/ /[A-Za-z'-]+/ /\b([A-Za-z]+)\b/ /\s([A-Za-z]+)\s/

# as many non-whitespace bytes as possible # as many letters, apostrophes, and hyphens

Probably what I would choose

# usually best # fails at ends or w/ punctuation

You need to decide what you want a word to be, and then write a pattern to detect it. For example, is sheep-shearing a word? What about Shepherd’s?

More info: See The Perl Cookbook, section 6.3 Page 167. What you mean by a word varies between languages. Perl doesn’t have a built-in definition of what a word is. You must make them from character classes and quantifiers. There is no simple, straight-forward answer to this question, so be careful.

How Do I … Comment Regular Expressions?  You want to comment regular expressions. # Find duplicate words in paragraphs, possibly spanning line boundaries. # Use /x for space and comments, /i to match the both `is' # in "Is is this ok?", and use /g to find all dups. $/ = ""; # paragrep mode while (<>) { while ( m{ \b # start at a word boundary (\w\S+) # find a wordish chunk ( \s+ # separated by some whitespace \1 # and that chunk again ) + # repeat ad lib \b # until another word boundary }xig ) { print "dup word '$1' at paragraph $.\n"; } }

xig

More info: See The Perl Cookbook, section 6.4 Page 168. Use the /x modifier. This will cause the regular expression engine to ignore most whitespace inside a regular expression and will also allow for the insertion of comments. The allowed whitespace is space, tabs, and newlines.

How Do I … Find The Nth Occurrence Of A Match?  You want to find the Nth match in a string, not just the first one. Input: One fish two fish red fish blue fish $WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } }

Example: Find the word preceding the third occurrence of “fish”. Use the /g modifier in a while loop and keep count of the number of matches.

The third fish is a red one. /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;

Use a repetition count and a repeated pattern

More info: See The Perl Cookbook, section 6.5 Page 170. The /g modifier creates a progressive match which can be used in a while loop. To find the Nth match, it’s easiest to keep your own counter and then whenever you reach the count you want, do whatever is appropriate.

How Do I … Read Records With A Pattern Separator?  You want read in records separated by a pattern. undef $/; @chunks = split(/pattern/, );

Solution: Read in the whole file and use split().

# .Ch, .Se and .Ss divide chunks of STDIN { local $/ = undef; @chunks = split(/^\.(Ch|Se|Ss)$/m, <>); } print "I read ", scalar(@chunks), " chunks.\n";

Create a localised copy of $/ which will be restored after the code finishes. By using split with () we also get the captured separators returned in the final array.

An example: The input stream is a text file that consists of lines separated by “.Ch”, “.Se”, and “.Ss”, which are codes used in troff. We want to find the text that falls between them.

More info: See The Perl Cookbook, section 6.7 Page 176. Example 1: (Note: $/ is Perl’s input record separator). $/ cannot be a pattern - it must be a fixed string. To get round this we undefine $/ so that the next read operation gets the whole of the rest of the file. Then we split that huge string using whatever pattern we choose.

How Do I … Read A Range Of Lines?  You want read all lines from one starting pattern to an ending pattern. while (<>) { if (/BEGIN PATTERN/ .. /END PATTERN/) { # line falls between BEGIN and END in the # text, inclusive. } }

Solution: use the range operator

while (<>) { if ($FIRST_LINE_NUM .. $LAST_LINE_NUM) { # line is between BEGIN and END # inclusive. } } }

Solution: use the range operator

You don’t need to keep track of any line numbers in your code, Perl is doing it for you.

More info: See The Perl Cookbook, section 6.8 Page 177. Solution: Use the range operator .. Either with patterns or with line numbers. Here’s a very interesting Perl one-liner which makes use of this feature: perl -ne ‘print if 23 .. 72’ any_old_file.txt Will print out just lines 23 to 72 of the file shown.

How Do I … Match From Where The Last Pattern Left Off?  You want to match again from where the last pattern left off. while (/(\d+)/g) { print "Found $1\n"; }

Solution: Use a combination of the /g modifier, the \G pattern anchor and the pos function.

$n = " 49 here"; $n =~ s/\G /0/g; print $n; 00049 here

Use \G to anchor the next match to the end of any previous match.

More info: See The Perl Cookbook, section 6.14 Page 190. If you use the /g pattern modifier, the Perl regular expression engine keeps track of its position when it finishes matching. The next time you match with /g the engine starts looking for a match from the remembered position. This lets you use a while loop to extract the information you want from the string.

How Do I … Match From Where The Last Pattern Left Off?  You want to match again from where the last pattern left off. $_ = "The year 1752 lost 10 days on the 3rd of September"; while (/(\d+)/gc) { print "Found number $1\n"; }

Find all the numbers.

if (/\G(\S+)/g) { print "Found $1 after the last number.\n"; }

Now find what follows the last number.

Found Found Found Found

numeral 1752 numeral 10 numeral 3 rd after the last number.

More info: See The Perl Cookbook, section 6.14 Page 190. By default, when your match fails (say when you run out of numbers in the example above), the remembered position is reset to the start. If you don’t want this to happen because you want to carry on matching then use the /c modifier with /g. This pattern: /\G(\S+)/g will find whatever non-whitespace characters follow the last number (rd, in this case).

How Do I … Expand And Compress Tabs?  You want to convert the tabs in a string into the appropriate number of spaces, or vice-versa. while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) { # spin in empty loop until substitution finally fails }

1

use Text::Tabs; @expanded_lines = expand(@lines_with_tabs); @tabulated_lines = unexpand(@lines_without_tabs);

2

while (<>) { 1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; print; }

3

use Text::Tabs; $tabstop = 4; while (<>) { print expand($_) }

4

More info: See The Perl Cookbook, section 1.7 Page 15. 1. Either use a funny looking substitution. 2. Use the standard Text::Tabs module. 3. 1 while (CONDITION) is the same as while (CONDITION} { # Code }. 4. Use the standard Text::Tabs module. LAB6 - REGEXP_1 LAB6 - REGEXP_2

Scope, Pragmas, Modules, Subroutines, References

Notes:

Scope  What do we mean by scope?  Variables are visible from the point at which they are defined.  Private versus Public: foreach my $pw @password_list { my $pw_length = length( $pw );

my ( $pw , $pw_length ); foreach $pw @password_list {

if ( $pw_length < 8 ) { print "$pw is too short\n"; }

$pw_length = length( $pw ); if ( $pw_length < 8 ) { print "$pw is too short\n"; }

} # $pw and $pw_length don’t exist # here

} # $pw and $pw_length do exist # here

Scope means whether a variable is temporary/permanent and private/public. By default (if you do nothing at all) Perl’s variables are global and permanent (Later we’ll see that these are called package variables). Makes writing short programs very easy, but they can be difficult to debug. In both cases we have forced all variables to be declared before they are used (using my) - that doesn’t affect the code. The point is that in the left example $pw and $pw_length only exist in this piece of code. In the right example the same two variables exist after the code is finished executing. Subroutine declarations are global declarations - wherever you place them they are visible to all code in your package.

Pragmas  A special kind of module that affects how your program is compiled.  Invoked by a use or a no.  Example: use strict; use integer; { no strict 'refs'; no integer; # .... }

Notes:

# allow symbolic references # resume floating point arithmetic

Pragmas  use constant; use use use use use use use

constant constant constant constant constant constant constant

BUFFER_SIZE ONE_YEAR PI DEBUGGING ORACLE USERNAME USERINFO

=> => => => => => =>

4096; 365.2425 * 24 * 60 * 60; 4 * atan2 1, 1; 0; '[email protected]'; scalar getpwuid($<); getpwuid($<);

sub deg2rad { PI * $_[0] / 180 } print "This line does nothing” unless DEBUGGING;

You can’t define more than one constant at a time. By convention all constants are defined in upper-case.

Pragmas  use integer; use integer; $x = 10/3; # $x is now 3, not 3.33333333333333333

use integer; $x = 1.8; $y = $x + 1; $z = -1.8;

This pragma tells the compiler to use integer arithmetic only from now to the end of the enclosing block. In the second example you’ll be left with $x == 1.8, $y == 2 and $z == -1. The case for $z is special since the - sign in front of the 1.8 counts as an operation (unary minus) so the value of 1.8 is truncated to 1 before its sign bit is flipped.

Pragmas  use lib; #!/usr/bin/perl -w use lib ( "/design/analog/software/Modules" ); use strict; use Carp; use English; use My_Constants; use Netlist_Functions; use use use use use

Mosfet; Capacitor; Resistor; Diode; Instance;

/design/analog/software/Modules My_Constants.pm Netlist_Functions.pm

Mosfet.pm

Capacitor.pm

Resistor.pm Diode.pm Instance.pm

This is used to modify the list of places in which Perl will look to find library modules. It’s roughly equivalent to adding to your Unix $path variable. The strict, Carp and English modules are all standard Perl modules. Perl always knows how to find these. The modules My_Constants, Netlist_Functions, Mosfet, Capacitor, Resistor, Diode and Instance are all imported from our user defined directory. Parameters to use lib; are prepended to Perl’s search path.

Pragmas  use strict; use strict;

# Install all three strictures.

use strict "vars"; use strict "refs"; use strict "subs";

# Variables must be predeclared. # Can't use symbolic references. # Bareword strings must be quoted.

use strict; no strict "vars";

# Install all... # ...then renege on one.

use strict 'subs'; $x = whatever; $x = whatever();

# WRONG: bareword error! # This always works, though.

sub whatever; $x = whatever;

# Predeclare function. # Now it's ok.

This pragma changes what Perl considers to be legal code. Sometimes these strictures seem too strict for casual programming - until you spend an hour looking for a bug which wouldn’t have happened if you’d used this pragma. There are three things we can be strict about: subs, vars, and refs. Symbolic references are suspect for a lot of reasons - its pretty easy to use one even when you don’t mean to. With this stricture in effect you can only use real or hard references. So, what are symbolic references? Strict vars will trigger a compile time error if you attempt to access a variable which has not met one of the following criteria: 1. Predefined by Perl self (i.e. a built-in variable). 2. Declared with our (for a global) or my (for a lexical). 3. Imported from another package. 4. Fully qualified using its package name and the :: package separator.

Standard Modules    

Carp Cwd English Exporter

-

Report errors from a users perspective. Finds the current working directory. Allows use of English variable names. Determines what a module exports.

 There are lots of other modules - see Chapter 32 of “Programming Perl”.

Carp lets you report errors from the perspective of a user, so if a user fails to use your modules correctly, the error messages will show up not as problems in your code (which of course you’ve thoroughly debugged), but in the users code. In other words this is a blame shifter. Cwd is a module which lets you find out the current working directory - for Unix this isn’t too useful since you can always use $cwd = `pwd`; However, this is guaranteed to work on all systems where Perl is installed even when they don’t have a shell function which will let them do $cwd = `pwd`; English lets you use English names instead of the standard Perl names for built-in variables. Exporter is used with modules to determine what subroutines can be seen from the outside of the module.

Subroutines  Syntax:  To declare a named subroutine without defining it do one of these. sub sub sub sub

NAME NAME PROTO NAME ATTRS NAME PROTO ATTRS

 To declare and define a named subroutine, add a BLOCK: sub sub sub sub

NAME BLOCK NAME PROTO BLOCK NAME ATTRS BLOCK NAME PROTO ATTRS BLOCK

This all looks pretty complicated - but this is normally how we do things.

sub say_hello { print "Hello world.\n"; } say_hello();

A subroutine is a small self-contained sub-program. It is Invoked by its name, it may have arguments passed to it and it can return a scalar or a list value. It’s defined using the sub keyword followed by the subroutine code in {}. Subroutines can be defined anywhere in your program, loaded in from other files via do, require or use, or generated at run time with eval. You can call a subroutine directly, indirectly through a variable containing either its name or a reference to the subroutine, or through an object letting the object determine which subroutine should really be called. To create an anonymous subroutine just leave out the name. PROTO and ATTRS stand for prototype and attributes respectively - they’re not so important. NAME and BLOCK are essential even when they’re missing. For forms without the name you need to have some way to call the subroutine, so do this: $subref = sub BLOCK; And then later on you can say: &$subref;

Subroutines  The function return causes execution of the subroutine to finish.  The value specified after the return is returned as the result.  Using a return statement is optional (but it shouldn’t be).  If one isn’t used, then the value returned is the value of the last statement executed. @sorted = dictionary_order( “eat” , “at” , “Joes” ); @sorted = dictionary_order( @unsorted ); @sorted = dictionary_order( @sheep , @goats , “shepherd” , $goatherd ); sub get_next { return <>; } prompt(); $next = get_next();

# always okay since () # always okay since ()

prompt; $next = get_next;

# error - hasn’t seen definition yet # okay: get_next definition already seen

sub prompt { print “next> “; }

Just as in previous examples, the lists passed to a subroutine are all flattened. So the third call to dictionary_order would contain the contents of the array @sheep, followed by the contents of the array @goats, the value of “shepherd” and finally the scalar value stored in $goatherd. It is possible to pass two or more arrays to a subroutine and have them maintain their integrity (i.e. keep them unflattened). If the subroutine does not require arguments then it can be passed an empty argument list. The list can also be missed completely as long as Perl knows it’s a subroutine. Like variables, subroutines have a leading symbol which indicates what they are. The name of a subroutine is preceded by an & which may be used when calling it. It must be used when calling a subroutine in certain contexts (we’ll see these in a minute). It can’t be used when defining the subroutine however. So this won’t work: sub &dictionary_order { return sort @_; }

# FATAL Compile Time Error

Other Ways To Call Subroutines  Subroutines which have been defined earlier can be called without “(“ and “)”. sub make_sequence # from, to, step_size { @list = (); for ( $n = $_[0] ; $n < $_[1] ; $n += $_[2] ) { push @list , $n; } return @list; } @stepped_sequence = make_sequence $min , $max , $step_size; &my_subroutine; # Means my_subroutine( @_ ); my_subroutine; # Means my_subroutine();

Arguments passed to a subroutine are available via the @_ array. Example 1: A subroutine already defined can be called without the “(“ and “)” around the argument list. Example 2: Another way to call a subroutine is to use the & prefix but without passing any arguments. In this case the subroutine has the value of the @_ array passed to it instead. This is used to call subroutines from within other subroutines. This is almost never used in new code but may be present in old code. Always use subroutines as shown in the style section of this course.

Named Subroutine Arguments  Suppose we had a subroutine which took a lot of arguments: ls( “*” , “any” , 1 , 1 , 0 , 0 , “alpha” , 4 , 1 ); ls( undef, undef , 1 , 1 , undef , undef , undef , 4 , 1 ); ls( cols => 1 , pages => 4 , width => 80 ); sub ls { %arg = @_; # convert a list to a hash $arg{ pages } = “*” unless exists $arg{ pages }; $arg{ cols } = 1 unless exists $arg{ cols }; #etc }

Example 1: You don’t want to pass 9 arguments to this subroutine when only a few are going to change. Example 2: You could arrange that passing undef as a parameter chooses a default value but we’d still have to write a long piece of code as shown. Example 3: Perl supports named parameters for arguments by passing a hash to a subroutine rather than an array. We can use the => operator to associate a name with each argument. Inside a subroutine we initialise a hash with the contents of the @_ array. This documents the call better and since the entries of a hash can be initialised in any order we don’t need to remember the order of parameters in the call.

Named Subroutine Arguments (Continued)

%std_listing = ( cols => 2 , pages => 4 );

Set up some defaults

ls ( files => “*.txt” , %std_listing ); ls ( files => “*.log” , %std_listing ); ls ( files => “*.hlp” , %std_listing ); ls ( files => “*.dat” , %std_listing , cols => 8 );

Use the defaults Override some of the defaults

In the first example we set up some default values for some arguments. In the second set of examples we use the standard set of parameters. In the third example we use a default set of arguments and then override some of that standard set as well.

Aliasing Of Parameters - Pass By Reference #!/usr/bin/perl -w use strict; my $line = “Mary had a little”; my $animal = “lamb”; Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” Print_Rhyme( $line , $animal ); # prints “Mary had a little dog” exit; sub Print_Rhyme # Parameters passed in @_ as aliases { print $_[0] . “ “ . $_[1] . “\n”; $_[1] = “dog”; return 0; }

In this code we pass the parameters in @_ (this is always true) and use them in the subroutine as aliases. Therefore when we change the value of one or more of the parameters in the subroutine we are actually changing them in the calling code as well. Therefore $_[1] = “dog”; has the effect of saying that my $animal = “dog”; on line 6. This is nearly always *NOT WHAT YOU WANT*

Aliasing Of Parameters - Pass By Value #!/usr/bin/perl -w use strict; my $line = “Mary had a little”; my $animal = “lamb”; Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb” exit; sub Print_Rhyme { my ( $line , $animal ) = @_;

# Parameters passed in @_ and copied # into local variables

print $line . “ “ . $animal . “\n”; $animal = “dog”; return 0;

This change is isolated to the Print_Rhyme subroutine.

}

In this code we pass the parameters in @_ (this is always true) and use them in the subroutine as values by copying them into local variables. Therefore when we change the value of one or more of the parameters in the subroutine the change is restricted to the values of the local variables in the subroutine. Therefore the assignment: $animal = “dog”; has no effect on the calling code - it is localised in the Print_Rhyme subroutine. This is the way you should use subroutines.

A Standard Way Of Using Subroutines sub _interpolate_value { my ( $t1 , $v1 , $t2 , $v2 , $time ) = @_; croak( "No t1 value in Waveform::_interpolate_value()" ) unless defined( $t1 ); croak( "No v1 value in Waveform::_interpolate_value()" ) unless defined( $v1 ); croak( "No t2 value in Waveform::_interpolate_value()" ) unless defined( $t2 ); croak( "No v2 value in Waveform::_interpolate_value()" ) unless defined( $v2 ); croak( "No time in Waveform::_interpolate_value()"

) unless defined( $time );

if ( $t1 == $time ) { return( $v1 ); } if ( $t2 == $time ) { return( $v2 ); } my $delta_t = $t2 - $t1; my $delta_v = $v2 - $v1; croak ( "Error - divide by zero in Waveform::_interpolate_value()" ) if ( $delta_t == 0 ); my $dv_by_dt = $delta_v/$delta_t; my $interpolated_value = $v1 + ( $time - $t2) * $dv_by_dt; return( int $interpolated_value ); }

Elements of the @_ array are special. They are not copies of the actual arguments. They are aliases to the actual arguments. If values $_[0], $_[1] etc. are changed then the argument in the calling routine is changed, i.e the parameters in this case are passed by reference. This behavior is useful but can lead to hard to find bugs. Would prefer to be able to pass by value - this is the more usual form, so explicitly copy the @_ array into a new array, and to be doubly safe make the receiving array a my() array. The above code is a fragment of an object-oriented program. The _ at the front of the subroutine name is a convention for internal subroutines in OO code - it’s a subroutine called only from within the object. croak() is a subroutine defined with: use Carp; It corresponds to die().

Subroutine Calling Context  When a subroutine is called it is possible to detect whether it was expected to return a scalar, a list or nothing at all.  The contexts in which a subroutine is called are: ls ( @files );

# void context: no return value expected

$listing = ls( @files );

# scalar context: scalar return value expected

@missing = ls( @files ); # list context: list return value expected ($f1 , $f2 ) = ls( @files ); # list context: list return value expected print ( ls( @files ) );

# list context: list return value expected

The information about the calling context is obtained from the wantarray function. The function returns: undef (false and undefined) if the subroutine was called in void context. “” (false and defined) if the subroutine was called in scalar context. 1 (true and defined) if the subroutine was called in list context. We could use his information to decide what value a subroutine needs to return.

Subroutine Prototypes  Subroutines can be defined with a prototype.  A series of specifiers which restrict the type and number of arguments. sub add_two_param ( $$ ) { return( $_[0] + $_[1] ); }

 The prototype is the ( $$ ) part.  This restricts the arguments to be two scalars.  But note - if you pass an array then the array context will be coerced to scalars - i.e. the two scalars will be the lengths of the arrays.  See perlsub man pages.

Notes:

How Do I … Access Subroutine Arguments  You have written a function and want to access the arguments passed by its caller. sub hypotenuse { return sqrt( ($_[0] ** 2) + ($_[1] ** 2) ); }

Solution

$diag = hypotenuse(3,4);

Invoke like this

# $diag is 5

sub hypotenuse { my ($side1, $side2) = @_; return sqrt( ($side1 ** 2) + ($side1 ** 2) ); }

Better version with private variables

More info: See The Perl Cookbook, section 10.1 Page 335. All values passed as arguments are in the special array @_. So the first argument is in @_[0] and so on. The number of arguments is scalar(@_). Subroutines should always start by copying the arguments into a new private array. To return a value from a subroutine use the return function. If there is no return statement, then the value returned by the subroutine is the value of the last statement executed by the subroutine.

How Do I … Make Variables Private To A Function  You want to use temporary variables in your function. sub somefunc { my $variable; my ($another, @an_array, %a_hash); # ... }

Solution: Use my to declare variables private to the subroutine.

my ($name, $age) = @ARGV; my $start = fetch_time();

You can combine my variables with an assignment

my ($a, $b) = @pair; my $c = fetch_time(); sub check_x { my $x = $_[0]; my $y = "whatever"; run_check(); if ($condition) { print "got $x\n"; } }

Declare some variables $x and $y private to this function

run_check() can’t see $x or $y

However, check_x can see $a, $b and $c since they are defined in the same scope

More info: See The Perl Cookbook, section 10.2 Page 337. $variable is only visible and accessible within the function somefunc(). When you declare many private variables you must do so inside a list, like this: my ($another, @an_array, %a_hash); Variables declared with my have lexical scope, which means that they only exist within a certain textual area of your code. Such a variable is destroyed when the body of code is ended. Usually the body of code is a block with braces around it like this: { # Your Code Here } Since a lexical scope is usually a block you will often hear the phrase lexical variables being only visible within their block.

How Do I … Create Persistent Private Variables  You want a variable to retain its value between calls to a subroutine but not to be visible outside that subroutine. { my $variable; sub mysub { # ... accessing $variable } }

Solution: Wrap the function in another block and declare my variables in the blocks scope rather then the functions.

BEGIN { my $variable = 1; # initial value sub othersub { # ... accessing $variable } }

Use a BEGIN block if you need to perform initialisation

{

By default the initial value in $counter is undef, which is treated as zero the first time next_counter() is called

my $counter; sub next_counter { return ++$counter } } BEGIN { my $counter = 42; sub next_counter { return ++$counter } sub prev_counter { return --$counter } }

Do this to initialise to anything other than 0

More info: See The Perl Cookbook, section 10.3 Page 339. Lexical variables don’t need to vanish when their scope ends. If something more permanent is still aware of the lexical then it will be maintained. (Perl does this by reference counting).

How Do I … Detect Return Context  You want to return a value that depends upon the calling context. if (wantarray()) { print "In list context\n"; return @many_things; } elsif (defined wantarray()) { print "In scalar context\n"; return $one_thing; } else { print "In void context\n"; return; # nothing } mysub(); $a = mysub(); if (mysub()) { @a = mysub(); print mysub();

Solution: Use wantarray()

# void context

}

# scalar context # scalar context # list context # list context

More info: See The Perl Cookbook, section 10.6 Page 344. Solution: Use wantarry() which returns one of three things depending on how the function was called. A function can decide what context it was called in and then return something which is appropriate to that context. List context is indicated by a true return value. Scalar context is indicated by a false return value which is defined. Void context is indicated by a undef return value.

References  Two kinds of references:  Hard (real - a bit like pointers in C, C++).  Symbolic (use the name of one thing to access some other thing).  Allows a variable or a subroutine to be accessed indirectly.  A reference is not a variable - it’s a means of accessing a variable.  To create a reference we use the \ operator.  This takes an ordinary variable and returns a reference to it, like this: $ref_to_scalar $ref_to_array $ref_to_hash $ref_to_sub

= = = =

\$my_scalar; \@my_array; \%my_hash; \&my_sub;

We are going to discuss hard references here and symbolic references (only in passing) at the end of this section. When we say references we will always mean a hard reference. Once we have a reference, we can get at the thing it refers to by prefixing the reference (optionally in { and }) with the appropriate symbol. To refer to $my_scalar we write one of these: ${\$my_scalar}; $$ref_to_scalar; ${$ref_to_scalar}; So we can access @my_array like this: @{\@my_array}; @$ref_to_array; @{$ref_to_array}; and so on. If you prefix a reference by the wrong symbol then you’ll get an error.

References  Accessing the elements of an array or hash through a reference: $a = ${ $hash_ref }{ “first” }; ${$array_ref}[0] = $h{ “first” };

This is a bit messy

$a = $hash_ref->{ “first” }; $array_ref->[0] = $h{ “first” };

But this is better

The arrow operator takes a reference on its left and either an array index in [] or a hash key in {} on its right. It locates the array or hash that the reference refers to and then access the appropriate element.

References And The ref() Function If $reference contains:

Then ref( $reference ) returns:

A scalar value

undef

A reference to a scalar

“SCALAR”

A reference to an array

“ARRAY”

A reference to a hash

“HASH”

A reference to a subroutine

“CODE”

A reference to a filehandle

“IO” or “IO:Handle”

A reference to a typeglob

“GLOB”

A reference to a precompiled pattern

“Regexp”

A reference to another reference

“REF”

Object references are missing from the above list because the thing a reference to an object will return is the name of the object. This, of course, changes as you use different objects.

Because dereferencing a reference with the wrong prefix can cause errors it’s sometimes necessary to be able to figure out what kind of referent a specific reference is referring to. The built-in ref() function takes a scalar value and returns a description of the kind of reference it contains. If a reference is used where a string is expected then the ref function is called automatically to produce a string and a unique hex address representing the internal memory address of the referent is appended. This means that printing out a reference usually produces something like: HASH(0x10027588) If you use the ref() function on an object, this will be returned: my $graphics_object = Polygon->new( 0 0 5 5 10 32 70 10 12 18 ); # Polygon coordinates print ref( $graphics_object ); # Will print “Polygon”

References And Anonymous Arrays  References are useful in creating multi-dimensional arrays: @table = ( ( 1 , 2 , 3 ) , ( 4 , 5 , 6 ) , ( 7 , 8 , 9 ) ,

This won’t work!

); @table = ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 );

@row1 @cols

@row1 = ( 1 , 2 , 3 ); @row2 = ( 4 , 5 , 6 ); @row3 = ( 7 , 8 , 9 ); $table

1

\@row1

@row2

\@row2

4

\@row3

@row3

2

3

5

6

8

9

@cols = ( \@row1 , \@row2 , \@row3 ); $table = \@cols;

7

The first example doesn’t work because of list flattening. So we need to use references to solve this problem. Each element in a Perl array can store a scalar, and a reference is a scalar (albeit a special kind of scalar). The bottom half of the slide shows how to set this up using references. The elements of the rows can be accessed using the arrow -> notation. $table->[1]->[2]; This means: find the array referred to by the reference in $table (i.e. @cols) and then get the element at index 1. That element stores a reference (a reference to @row2), the get the element at index 2. What’s the result? This is a popular way of creating data structures so Perl provides some simple assistance. If we place the list values in [] instead of () we create a reference to a nameless (or anonymous) array. The array is automatically initialised to the specified values.

References And Anonymous Arrays  References are useful in creating multi-dimensional arrays: @table = ( ( 1 , 2 , 3 ) , ( 4 , 5 , 6 ) , ( 7 , 8 , 9 ) ,

This won’t work!

[ 1 , 2 , 3 ] , [ 4 , 5 , 6 ] , [ 7 , 8 , 9 ] ,

But this will!

);

$table = [

];

The bottom example is identical to the data structure we set up on the previous page except that all the internal arrays are anonymous - so you can’t access @cols or @rows. The only access to the array elements is via the reference to the overall table. As a final piece of help, in any expression like: print $table->[$x]->[$y]; Any arrow between a closing square or curly bracket and an opening square or curly bracket can be removed. So the above can be rewritten like this: print $table->[1][2]; which is much neater.

References To Hashes %association = ( cat => “nap” , dog => “gone” , mouse => “ball” ); $association = { cat => “nap” , dog => “gone” , mouse => “ball” }; $behave = { cat

=> { nap => “lap” , eat => “meat” } ,

dog

=> { prowl = “growl” , pool => “drool” } ,

mouse => { nibble => “cheese” } , };

print “Cats eat “ , $behave->{cat}->{eat}; print “Cats eat “ , $behave->{cat}{eat};

Like the [] array constructor the {} hash constructor creates a reference which must be assigned to a scalar variable ($association), not to a hash (%association). Like the array reference, the values in the hash are only accessible via the hash reference: print $association->{ cat }; You can even nest hashes as well. Just like arrays, any -> between } and { can be omitted.

How Do I … Return More Than One Array Or Hash  You want to return more than one array or one hash. ($array_ref, $hash_ref) = somefunc();

Solution: Return references to the hashes or arrays

sub somefunc { my @array; my %hash; # ... return ( \@array, \%hash ); }

sub fn { ..... return (\%a, \%b, \%c); # or return \(%a, %b, %c); # same thing }

More info: See The Perl Cookbook, section 10.9 Page 347. Just as all lists are flattened when multiple lists are passed to a function, the same happens with lists returned from functions with the return statement. Therefore to maintain the integrity of the arrays and hashes which are returned from a function, the arrays and hashes must be returned as references.

Creating Data Structures  Suppose you write this as the first line of your program: $sue{ children }->[1]->{ age } = 10;

 That’s pretty minimalist (and neat).

Perl creates a hash called %sue, gives it a new hash element indexed by the string children, points that to a newly allocated array whose second entry is made to refer to a newly allocated hash which gets and entry indexed by the string age.

References To Subroutines  Anonymous subroutines can be created like this: sub { print “Hello $_[0]\n”; }

 The above is useless since there’s no way to execute the subroutine, so do this: $sub_ref = sub { print “Hello $_[0]\n”; };

 We can then call this: $sub_ref->( “Steve”; )

Notes: The “;” at the end of the second example is required since the whole line is a statement. The third example executes the code in the subroutine reference. We need to pass a parameter to the subroutine and this is done by enclosing it between “(” and “)”.

Passing Subroutine Arguments As References sub mysub { # Arrays are references, counts are scalars

Might be useful to prefix references with ref_

my ( $array1 , $count1 , $array2 , $count2 ) = @_; my $item1 = $array1->[ $count1 ]; my $item2 = $array2->[ $count2 ]; #

Suppose $item1 = 15 and $item2 = 36

return( $item1 , $item2 ); } # Call the above like this (assumes arrays and counts already set up) my ( $r1 , $r2 ) = mysub( \@array1 , $count1 , \@array2 , $count2 ); print $r1 , $r2; # prints 15 and 36

References provide a way of passing unflattened arrays or hashes to a subroutine (remember that when we pass more than one array to a subroutine their identity is lost because of array flattening). In this code we are expecting four parameters to be passed to mysub, two arrays, and two scalars which will be interpreted as an index into those arrays. The arrays are passed by reference, the scalars by value. Note that we can return more than one value from a subroutine - in this case we return 2.

Returning Subroutine Results As References sub make_random_list { # Counts are scalars my ( $count1 , $count2 ) = @_; my @new_array = (); foreach my $index ( $count1 .. $count2 ) { $new_array[ $index ] = rand(); } return( @new_array );

This is an example of how not to do it. What do you think is wrong with this?

} # Call the above like this: my @big_random_array = make_random_list( 42 , 14826504 ); # Do stuff with big_random_array print $big_random_array[ 137 ];

Subroutines can return references as well as receiving them. This example shows a subroutine which generates a large list of random numbers and then copies that list back the the code which called the subroutine. As shown above the list is copied back by value, I.e. a big copy of the list is passed back to the calling code as a large array. This means that in the program code there exists: 1 copy of the array in the subroutine, and once the subroutine ends and the array @new_array goes out of scope, that array is destroyed by Perl. 1 copy of the array is brought into existence in the main program as the end of subroutine is reached and each of the internal values in new_array is copied back into big_random_array. TINTWTDI.

Returning Subroutine Results As References sub make_random_list { # Counts are scalars my ( $count1 , $count2 ) = @_; my @new_array = (); foreach my $index ( $count1 .. $count2 ) { $new_array[ $index ] = rand(); } return( \@new_array );

This is an example of how to do it. What do you think is wrong with this?

} # Call the above like this: my $big_random_array = make_random_list( 42 , 14826504 ); # Do stuff with big_random_array print $big_random_array->[ 137 ];

In this code there is only ever one copy of the list - and it’s the one defined in the subroutine. When the subroutine ends and returns a reference to the list, normally Perl would arrange for the list to be destroyed (since it’s local to the subroutine and it’s about to go out of scope). However, since the subroutine is passing back a reference to an array, Perl arranges for the array to remain in existence. Only if the reference to the array is ever made to cease to exist, will Perl then delete the array which was defined inside the subroutine. Perl does this using a mechanism called reference counting. Basically it means that all Perl’s garbage collection is done for you. If you wanted to force Perl to delete the array inside the subroutine (to save on memory, say) then all you need to do is to; undef $big_random_array; Perl will reduce the reference count on the variable, and if it is zero then the array created by the subroutine will be deleted. Also, since only one thing (a scalar which is a reference) is passed back from the subroutine to the calling code, it’s very quick and efficient.

Symbolic References  Examples: $name = "bam"; $$name = 1; $name->[0] = 4; $name->{X} = "Y"; @$name = (); keys %$name; &$name;

# # # # # #

Sets $bam Sets the first element of @bam Sets the X element of %bam to Y Clears @bam Yields the keys of %bam Calls &bam

With symbolic references Perl is using the value of one variable as the name of another variable. This can be error prone and confusing, so I tend not to use this type of reference. You can force Perl to make all of the above examples into errors by using: use strict; Which I would recommend. If you then have a desperate need to use a symbolic reference for a while you can then always countermand the stricture with: no strict ‘refs’;

Packages sub call { ( $sub_ref , @args ) = @_; $sub_ref->( @args ); } package phone; sub call { if ( dial() ) { talk(); } }

This defines three completely distinct subroutines named call. The first is in the main namespace. The second is in the phone namespace. The third is in the poker namespace. If we do this, which call are we calling?

package poker; sub call { $pot = 21; deal(); }

package main; call( $ref , @args );

We would all like to use popular variable names like $count, $filename, $I. If we did this there wouldn’t be any way to use other peoples code, since they would have used the same variable names. Perl solves this problem by assigning each named variable and each named subroutine to a particular family, known as a package. Each package maintains its own symbol table or namespace. So two different packages may each have different variables and subroutines with identical names in their own namespace. By default Perl assumes that code is written in the namespace of the main package (which is called, appropriately enough, “main”). You can change that default by using the package keyword. A package declaration changes the namespace until another package declaration is made or until the end of the current enclosing block, eval, subroutine or file. See example: The example defines three subroutines called “call” in three different packages. The first, since it isn’t explicitly named is the main package. If we wanted to call one of the other subroutines called call, we could either switch to the package or we can call the subroutine version explicitly by prefixing the subroutine name by the package name like this: poker::call();

Package Variables  Perl variables come in two flavours: package html;  Package variables. $i = 56;  Lexical variables.  Package variables belong to a particular package.  These are the standard, no-preparation-necessary, instant variables we all use most of the time. for ( $i = 0 ; $i < 100 ; $i++ ) { print “$i\n”; }

for ( $i = 0 ; $i < 100 ; $i++ ) { print “$html::i\n”; }

Prints 0 .. 99

Prints ???

$i is created when it is referenced and it exists until goes out of scope, in this case the end of the program since it isn’t a lexical variable - it belongs to the current package. We can force the use of a variable in another package by prefixing the name of the variable with the name of the package followed by a ::

Lexical Variables  Lexical variables:  Lexical variables are declared explicitly with the keyword my. package main; my $i;

A lexical variable

for ( $i = 0 ; $i < 100 ; $i++ ) { my $time = localtime(); print “$i at time=$time\n”; }

A lexical variable

Lexical variables differ from package variables in three ways: 1 They don’t belong to any package, so you can’t prefix them with a package name. 2 They can only be accessed within the physical boundaries of the code block or file scope in which they are declared. In the code shown, the variable $time is only accessible to code physically located in the for loop and not to code appearing before of after the loop. 3 They usually cease to exist each time the program leaves the code block in which they were declared. In the example the variable $time ceases to exist at the end of each iteration of the for loop (it is recreated at the beginning of each iteration of the loop).

Modules  Modules are the re-use part of Perl.  A Perl module is a text file with a suffix .pm containing some Perl code.  It’s placed in a “standard” place.  You can add to the “standard” places with a use lib; statement.  When the compiler encounters a use statement in a program it searches through the standard directories, locates the file, and loads the code.  Modules come in two flavours:  Traditional - Interface available by exporting symbols.  Object Oriented - Interface available by method calls.  When you have created a module you can control what is visible to a user with the Exporter() module. See the example at the end of this section.

The easiest way to see how to use modules is by example. An example of exporting a module interface with symbols follows on the next slide. An example of exporting a modules interface with method calls will be shown when we come to Object Oriented Perl. (Generally Object oriented modules export nothing, since the whole idea of methods is that Perl finds them for you automatically based on the type of the object).

An Example Of Building A Module  To build a module called Bestiary, create a file called Bestiary.pm that looks like this: package require our our our our

Bestiary; Exporter;

@ISA @EXPORT @EXPORT_OK $VERSION

= = = =

qw(Exporter); qw(camel); # Symbols to be exported by default qw($weight); # Symbols to be exported on request 1.00; # Version number

### Include your variables and functions here sub camel { print "One-hump dromedary" } $weight = 1024; 1;

This is very important

In the example a program can now do this: use Bestiary; to be able to access the camel function (but not the weight variable), and: use Bestiary qw( camel $weight ); to access both the function and the variable. When you use a module, the module usually makes some variables or functions available to your program - some symbols are exported from your module. Most modules use Exporter to do this. When modules are loaded they must return a TRUE value to indicate that the loading was successful. This is usually represented by retuning the TRUE value as shown on the last line of the example.

An Example Of Building A Module require Exporter; our @ISA = ("Exporter");

These two lines make the module inherit from the Exporter class (described in object-oriented Perl). Bestiary can now export symbols into other packages with lines like this.

our @EXPORT = qw($camel %wolf ram); # Export by default our @EXPORT_OK = qw(leopard @llama $emu); # Export by request our %EXPORT_TAGS = ( # Export as group camelids => [qw($camel @llama)], critters => [qw(ram $camel %wolf)], );

use use use use use use use use use

Bestiary; # Import Bestiary (); # Import Bestiary qw(ram @llama); # Import Bestiary qw(:camelids); # Import Bestiary qw(:DEFAULT); # Import Bestiary qw(/am/); # Import Bestiary qw(/^\$/); # Import Bestiary qw(:critters !ram); # Import Bestiary qw(:critters !:camelids); # Import

@EXPORT symbols nothing the ram function and @llama array $camel and @llama @EXPORT symbols $camel, @llama, and ram all scalars the critters, but exclude ram

You can include any of these statements to import symbols from the Bestiary module.

critters, but no camelids

The first two line make the module inherit from the Exporter class. The second set of lines tells Bestiary what it is allowed to export into classes which use it. The third set of lines can all be used in any program which uses Bestiary to determine what is and what is not imported into the current package. Leaving a symbol off the export lists does not render that symbol inaccessible to the program using the module. The program will always be able to access the contents of the modules package by fully qualifying the package name, like this: $Bestiary::number_of_lambs;

POD, Special Variables, Internal Perl Functions Command Line Switches, Perl One-liners

Notes:

POD  Perl supports a simple mark-up langauage called POD  Plain Old Documentation.  You can embed POD in any sort of file - including Perl scripts/programs.  Perl simply skips over the POD when compiling.  The Perl lexer starts skipping when it sees an = sign and an identifier.  

=head1 Here There Be Dragons! All of the text from here until the lexer sees =cut, will be ignored.

=item snazzle The snazzle() function will behave in the most spectacular form possible =cut sub snazzle { my $arg = shift; .... }

If you ever download CPAN modules you’ll find that a lot of them have POD documentation included within the code. This is confusing at first until you realise that the compiler just skips over all the POD. Perl ships with tools to convert files containing POD into various printable file formats: pod2text File.pm | more pod2man File.pm | nroff -man | more Or pod2man File.pm | troff -man -Tps -t > tmppage.ps ghostview tmppage.ps Pod2html File.pm > tmppage.html For a complete overview of POD see Chapter 26 of Programming Perl 3rd edition. Look at Mosfet.pm in the Examples/OO_Code area. Also see Mosfet.pod_text, Mosfet.man, Mosfet.postscript and Mosfet.html in the same area.

Some Special Variables use English;

Short name What it does

@ARG

@_

Argument list passed to subroutine

$ARG

$_

Default input and search pattern Hash containing your current environment variables

%ENV $LIST_SEPARATOR

$"

Defaults to a space

$MATCH

$&

The string matched in the last successful pattern

$POSTMATCH

$’

The string following what was last matched

$PREMATCH

$`

The string preceding what was last matched

$ERROR

$!

Current value of last system call

STDERR

Special filehandle for standard error in any package

STDIN

Special filehandle for standard input in any package

STDOUT

Special filehandle for standard output in any package

This is not an exhaustive list - see Chapter 28 of Programming Perl, 3rd edition. Items without a short name don’t need the use English; pragma.

Some Perl Functions (By Category)  Scalar manipulation:  chomp, chop, hex, lc, length, oct, reverse, sprintf, substr, tr///, uc, y///.  Regular expressions:  m//, s///, split.  Numeric functions:  abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, srand.  Array processing:  pop, push, shift, unshift.  Hash processing:  delete, each, exists, keys, values.  Filehandles, files and directories:  chdir, chmod, chown, chroot, link, mkdiir, open, opendir, rename, rmdir, stat, umask, unlink, utime.

Notes:

Some Perl Functions (By Category)  Flow of program control:  continue, die, eval, exit, goto, last, next, redo, return, sub, wantarray.  Miscellaneous:  defined, eval, scalar, undef.  Process and process groups:  alarm, exec, fork, kill, pipe, setpriority, sleep, system, wait, waitpid.  Library modules:  import, package, require, use.  Classes and objects:  bless, package, ref, use.  Time:  gmtime, localtime, time.

There are also extensive categories for: 1. Low-level socket access. 2. Inter-process communication. 3. Fetching user and group information. 4. Fetching network information.

Examples Of chop() And chomp() Remember, chop is indiscriminate, it always removes something, so you’re supposed to know that the last character on a line is “\n”.

@lines = `cat myfile`; chop @lines; chop($cwd = `pwd`); chop($answer = <STDIN>); $answer = chop($tmp = <STDIN>);

# WRONG

What is in $answer?

$last_char = chop($var);

while () { chomp; # avoid \n on last field @array = split /:/; ... }

chomp is more discriminating, it will only remove the last character if it’s a “\n”. You could also do s/\n$//; which is explicit.

You almost always want to use chomp() and not chop(). chop() always returns the character it removes. If you chop() a list, then every item in the list is chopped. The thing which ends up in $answer in the question on the slide is the character which was removed from the string $tmp. The thing you probably wanted was $tmp. chomp() is discriminating, and although by default it always removes the last character on a line only if that character is “\n”, the default can be overridden. The character (or string) which is removed is that contained in the Perl variable $/. So chomp() can remove any arbitrary length string from the end of an input string. chomp() returns the number of characters it deleted - not the characters themselves.

Examples Of hex() And oct() $number = hex("ffff12c0"); sprintf "%lx", $number; # (That's an ell, not a one.)

perl -e 'print 0xffdc;'

sprintf uses the same conventions as C’s sprintf.

A neat command line alternative when you need a quick conversion.

$val = oct $val if $val =~ /^0/;

Does $val start with an “0” (as opposed to “0x” or “0b”).

$perms = (stat("filename"))[2] & 07777; $oct_perms = sprintf "%lo", $perms;

Note that you can always set the value of any variable with a hex value just by doing this: $h_number = 0xffdd; print $h; The hex() function is interpreting a string as a hex number, not a value. If the string begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as shown. Hex strings can only represent integers. Strings which would cause integer overflow will trigger a warning. oct() will interpret a string as an octal value. If the string starts with “0” it will be interpreted as octal. If the string starts with “0x” it will be interpreted as a hex value. If it begins with “0b” it will be interpreted as a binary value. Try this: perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad enough to know from what 80’s/90’s TV series this was an episode title.

Examples Of sprintf() Field

Meaning

%%

A percent sign

%c

A character with the given number

%s

A string

%d

A signed integer, in decimal

%u

An unsigned integer, in decimal

%o

An unsigned integer, in octal

%x

An unsigned integer, in hexadecimal

%e

A floating-point number, in scientific notation

%f

A floating-point number, in fixed decimal notation.

%g

A floating-point number, in %e or %f notation

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Be careful - sprintf() in Perl does its own formatting - it is NOT calling the underlying sprintf() function in the C library.

Examples Of sprintf() Field

Meaning

%X

Like %x, but using uppercase characters

%E

Like %e, but using uppercase “E”

%G

Like %g, but using uppercase “E” if applicable

%b

An unsigned integer, in binary

%p

A pointer (the Perl value’s address in hexadecimal)

%n

A special: stores the number of characters output so far into the next variable in the argument list.

In addition to the formats on the previous slide, Perl also supports the following conversions. For compatibility, Perl also supports these conversions: %I - a synonym for %d %D - a synonym for %ld %U - a synonym for %lu %O - a synonym for %lo %F - a synonym for %f

Examples Of sprintf() Flag

Meaning

space

Prefix positive number with a space

+

Prefix positive number with a plus sign

-

Left-justify within field

0

Use zeroes, not spaces, to right-justify

#

Prefix non-zero octal with “0”, non-zero hex with “0x”

number

Minimum field width

.number

“Precision”: digits after the decimal point for floating-point numbers, maximum length for a string, minimum length for an integer.

l

Interpret integer as a C type long or unsigned long

h

Interpret integer as C type short or unsigned short (if no flags are supplied interpret integer as C type int or unsigned

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition. Perl allows the following flags between the % and the conversion character.

Examples Of split() @chars @fields @words @lines

= = = =

split split split split

//, /:/, " ", /^/,

$word; $line; $paragraph; $buffer;

Question: What does this produce?

print join ':', split / */, 'hi there';

($login, $passwd, $remainder) = split /:/, $_, 3;

split /([-,])/, "1-10,20";

# Produces the list (1, '-', 10, ',', 20);

split /(-)|(,)/, "1-10,20"; # Produces the list (1, '-', undef, 10, undef, ',', 20)

$string = join(' ', split(' ', $string));

Syntax: Split /PATTERN/ , EXPR , LIMIT split /PATTERN/ , EXPR split /PATTERN/ split split() scans a string and splits the string into lots of sub-strings, returning the resulting list in list context, or the count of sub-strings in scalar context. The separator is determined by pattern matching using the regular expression given as part of the split() function - so the separators need not be the same size and need not be the same string, on every match. Normally the separators are not returned (but if the pattern contains () then the substring matched by each pair of () IS included in the resulting list, interspersed with the fields which are normally returned). If more than one pair of () is used then one substring is returned for each pair (some may be undef, so be careful). If the pattern doesn’t match at all then split() returns the original string. If a limit is supplied then Perl will not return more than that number of sub-strings. If no sting is supplied then Perl uses “$_”. If no pattern is supplied or is the literal space “ “, then the function splits on whitespace, /\s+/, after skipping any leading whitespace.

Examples Of split()

open PASSWD, '/etc/passwd'; while () { chomp; # remove trailing newline ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split /:/; ... }

while (<>) { foreach $word (split) { $count{$word}++; } }

Both examples make use of defaults. In both cases the input text is extracted with the <> operator and thus the splitting occurs on “$_”. In the second case split() is passed no string (so it uses “$_”) and no pattern (so it strips all leading whitespace and then splits on whitespace).

Examples Of stat() And unlink() ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat $filename;

if (-x $file and ($d) = stat(_) and $d < 0) { print "$file is executable NFS file\n"; }

$mode = (stat($filename))[2]; printf "Permissions are %04o\n", $mode & 07777;

use File::stat; $sb = stat($filename); printf "File is %s, size is %s, perm %04o, mtime %s\n", $filename, $sb->size, $sb->mode & 07777, scalar localtime $sb->mtime;

$count = unlink ‘file1’ , ‘file2’ , ‘file3’; unlink @victims();

The stat() function returns a 13 element list giving statistics for a file. If a file stat isn’t supported on a particular file system then the corresponding entry will be zero. See page 801 of “Programming Perl, 3rd edition” for more details. The File::stat module provides a convenient, by-name access mechanism. The unlink() function is used to delete a list of files. The function returns the number of files which were successfully deleted. BE CAREFUL - this is ‘rm’ in disguise.

gmtime And localtime # 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime; $london_month = (qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];

# 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime; $thisday = (Sun,Mon,Tue,Wed,Thu,Fri,Sat)[(localtime)[6]]; perl -le 'print scalar localtime'

All elements of the lists returned by gmtime() and localtime() are numeric, so January is month 0, Sunday is day 0. $year is the number of years since 1900.

system And exec And `` @args = ("command", "arg1", "arg2"); system(@args) == 0 or die "system @args failed: $?” # If the program succeeds - then life goes on @args = ("command", "arg1", "arg2"); exec(@args); # You will never get here.

my $current_directory = ’pwd’;

This example uses backticks to capture the output of the “pwd” command. Why is this a bad example?

The system() and exec() functions execute any program on your system for you and return that programs exit status - not the programs output. To capture the output from a program you must use backticks or qx//. The difference between the two functions is that system() will fo a fork first and then wait for the executed program to finish. That is, it runs your program for you and returns when it is done. Exec() replaces your running program with the the new one, so it never returns if the replacement succeeds (which makes the return of the exit status a bit redundant). See “Programming Perl”, 3rd Edition, page 811, for more details. In the last example on the slide we use backticks to figure out what our current directory is. This is an example of how you can capture the output of an external program - a bad example, because what will happen if you put this script on your web-page, someone downloads it and then they find out it doesn’t run because their system doesn’t have a pwd command.

Command Line Switches And Writing Perl One-Liners  The -e switch allows you to write scripts directly on the command line. perl -e ’print “Hello World\n”;’

 Perl programs can receive arguments from either:  Standard input.. cat myfile | perl -e ’while(<>){ print unless /^\s+#/; }’

perl -e ‘while (<>){ print unless /^\s+#/; }’ < myfile perl -e ’while(<>){ print unless /^\s+#/; }’ myfile

 The @ARGV array. perl -e ‘print “@ARGV\n”;’ alpha.doc beta.txt gamma.eps

Perl one-liners fit the whole of a Perl program onto one line (a command line). See the accompanying article in the second edition of the Perl Review (contained as a .pdf file in the Examples directory). Also see the whole of Chapter 19 of “Programming Perl”, 3rd edition, Pages 486-503 inclusive. The first example is something you’ve already seen. In the second example the pipe operator | takes the output of cat and makes it the standard input to the Perl program. The diamond operator <> takes lines from standard input, so this example prints the contents of the file “myfile” and executes the pattern match shown (which throws away all comments - as long as comments start with a #). The third example does the same as the second but uses the file redirection operator (<). The fourth example uses the fact that the diamond operator can also open and redirect the contents of a file specified on a command line. So this example is exactly equivalent to both examples 2 and 3. The last example prints out this: alpha.doc beta.txt gamma.eps

Perl Command Line Switches Useful For One-Liners Switch

Effect

-e

Used to enter one or more lines of a script.

-i

Specifies that files processed by <> are to be edited in place.

-iEXTENSION

Specifies that files processed by <> are to be edited in place

-mMODULE

Loads MODULE as if you had executed a use.

-n

Causes Perl to assume a loop around your code which makes it iterate over filename arguments. See Example.

-p

Causes Perl to assume a loop around your code which makes it iterate over filename arguments. See Example.

Use the -I option with care. It renames the input file, opens and output file with the original name and then selects that output file for all print, printf and write statements. If you use only the -I option then NO BACKUP COPY OF YOUR ORIGINAL FILE IS MADE. The original file will be overwritten. If you do specify EXTENSION then the original file is backed up using extension to supply a new name. Here’s an example: perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz # Note that the -p option has not yet been discussed. This will load the file called xyz, rename a backup copy to xyz.orig, open a new version of xyz for output and run the substitution on the original file contents, placing the result of the substitutions into the new file (still called xyz).

An Example Of A Perl One-Liner #!/usr/bin/perl $extension = '.orig'; LINE: while (<>) { if ($ARGV ne $oldargv) { if ($extension !~ /\*/) { $backup = $ARGV . $extension; } else { ($backup = $extension) =~ s/\*/$ARGV/g; } unless (rename($ARGV, $backup)) { warn "cannot rename $ARGV to $backup: $!\n"; close ARGV; next; } open(ARGVOUT, ">$ARGV"); select(ARGVOUT); $oldargv = $ARGV; } s/foo/bar/; } continue { print; # this prints to original filename } select(STDOUT);

This,

Does exactly the same as this.

perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz

The example from the previous slide is expanded here as the minimum needed to replace the functionality of the one-liner.

The Perl -n And -p Command Line Switches  The -n switch causes Perl to assume the following loop around your script, which makes it iterate over the filename arguments much as sed -n or awk do. LINE: while (<>) { ... # your script goes here }

 The -p switch causes Perl to assume the following loop around your script, which makes it iterate over the filename arguments much as sed does. LINE: while (<>) { ... # your script goes here } continue { print or die "-p destination: $!\n"; }

In both cases you can use LINE as a loop label from within your script, even though you can’t actually see it in your file. With the -n switch, lines are not printed by default. With the -p switch, lines are printed automatically. In both cases BEGIN and END blocks may be used to capture control before or after the implicit loop - just like awk.

Other Perl Command Line Switches Switch

Effect

-c

Causes Perl to check the syntax of the script and then exit without executing what has just been compiled.

-d

Runs the script under control of the Perl debugger.

-h

Prints a summary of Perl’s command line options.

-T

Turns on “taint” checks - an extra form of security useful for running CGI scripts.

-v

Prints the version number and patch level of the Perl executable.

-w

Prints warnings about variables which are used only once, and variables which are used before being set. See Chapter 33 of “Programming Perl” 3rd edition.

We will discuss the perl debugger later. Everyone should always run Perl with the -w option, either as here, as part of the command line, or more generally as part of the: #!/usr/local/bin/perl -w There are many more command line switches than those listed. See the whole of Chapter 19 of “Programming Perl”, 3rd edition for a complete description.

Command Line Arguments etc. Item

Description

ARGV

The special filehandle that iterates over command line filenames in @ARGV.

$ARGV

Contains the name of the current file when reading from the ARGV handle using <>.

@ARGV

The array containing the command-line arguments intended for the script. $#ARGV is the number of arguments minus one. $ARGV[0] is the first argument, not the command name. Use scalar @ARGV for the number of program arguments.

@ARG

Within a subroutine, this array holds the argument list passed to that subroutine.

@_

Within a subroutine, this array holds the argument list passed to that subroutine.

Notes:

Adding Command Line Arguments To Your Own Programs  There are two options:  Use the CPAN getopts module.  Write your own code - like this: sub Process_Command_Line_Arguments { my ( $ref_arguments ) = @_; my $numargs = @$ref_arguments; # Process all arguments my $next_arg; while ( $numargs-- ) { $next_arg = shift( @ARGV ); SWITCH: { if ( $next_arg =~ m/^\-i/i if ( $next_arg =~ m/^\-o/i if ( $next_arg =~ m/^\-d/i if ( $next_arg =~ m/^\-/i } } return TRUE;

) ) ) )

{ { { {

$main::infile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; } $main::outfile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; } $main::debug = TRUE; last SWITCH; } croak( "Unknown command line switch $next_arg" ); }

}

Note that the input arguments are via a reference. You should also include some code to look for something like -h or -help, print out something useful and then exit the program.

Conclusion  You’ve seen a lot in a short time.  The key points of Perl are that:  Variables consist of scalars and collections of scalars (arrays & hashes).  A lot of the control structures are similar to C etc.  References and subroutines.  Packages and Modules.  Pattern matching is very powerful.  Perl is a very versatile language.  You all now know enough to write useful Perl programs.

Notes: Now give the advanced material in Style, then run LAB7 MODULES_AND_SUBROUTINES_1

Style Guidelines For Perl 1

Introduction

This document presents guidelines for anyone who writes Perl scripts for design support tasks. The aim is to introduce a common style and understanding for the benefit of anyone who either writes new programs, or has to debug and/or maintain old ones.

2

Program Structure

Structure your program in the same way you would structure a C program. Have one section of code that is the equivalent of C’s main(), and as long as the total program size is anything other than trivially small, put code into subroutines that are called from the main program body. Don’t structure the top-level of a program in file-scope since any variables declared there are visible in all following subroutines (even if they’re lexical, or my, variables)– instead create the top-level of your program as a code block (if you to think in C terms, even label it as MAIN if this helps you) and put all code there. Also, don’t use global variables at all (i.e., outside the code block), since this allows variables to have side-effects in different subroutines. To achieve both of these features structure your code like this: #!/usr/local/bin/perl use strict; use warnings; use diagnostics; sub subroutine_1( $$$ ); MAIN: { my $variable_1 = 27; # Program code – equivalent of C’s main() } exit; sub subroutine_1( $$$ ) { # Subroutine body – can’t see $variable_1 unless it was passed as an # parameter in the subroutine call. }

use strict and use warnings are never optional, while use diagnostics gives readable

error messages that are useful for new users (and old ones).

The loop with the label MAIN: is where the main body of the program is written. A code block like this is the equivalent of a loop that runs exactly once, but has the feature that all the lexical variables declared within its scope are restricted to that scope, i.e., subroutine_1 can’t see the values of any lexical variables like $variable_1 unless they are passed to subroutine_1 as an argument of a subroutine call to subroutine_1 (which is basically how you’d hope a program would behave). Also note that the label (MAIN:) is optional, and can be omitted. Note that subroutine_1 is declared before the main body of the program. This is only needed if the subroutine definitions follow the main program – if they precede it then the forward declarations aren’t needed since the declaration is also the definition. Also note that subroutines can optionally be declared with prototypes (the $$$ in ( $$$ ) which here declares that the subroutine is expecting three scalar arguments). This check is performed at compile time so there’s no run-time overhead for doing this. If you must use a global variable (you really shouldn’t) then make it explicit that this is what you’re doing by referring to it as a package variable like this: #!/usr/local/bin/perl sub subroutine_1(); $main::count = 56; MAIN: { $main::count = 27; subroutine_1(); } exit; sub subroutine_1() { print “The value of count is $main::count\n”; }

Here we’ve declared a global variable called $main::count (it’s a variable named $count in package main, the default package name, which is why it’s name is $main::count). This code prints the value 27 when executed since the initial value of 56 is overwritten in the main body of code and this is the value seen in subroutine_1 when it is executed. Note that the value of $main::count wasn’t passed to subroutine_1 as a parameter, but subroutine_1 can still see its value (it can change its value as well – this is what I mean by having a side-effect).

2.1 Should Subroutine Parameters Be Passed By Value Or Reference ? If you don’t want parameters passed in subroutines to be changed by the subroutine, then pass parameters by value. This is nearly always what you want. To do this copy all the parameters to the subroutine into lexical variables at the start of the subroutine like this:

2 / 18

July 31, 2005

subroutine_1( $$$ ) { my ( $var_1 , $var_2 , $var_3 ) = @_; # Subroutine code goes here. $var_1 etc are private to this code }

This is a common Perl idiom where all the variables from the @_ array are copied into lexical variables in the subroutine. This makes those variables local to the subroutine – changing them in the subroutine will NOT change them in the calling code. This is normally how you would expect programs to behave. If you do want a variable in a subroutine to be changed in the calling code then pass the variables to the subroutine by reference instead. This is done like this: MAIN: { my $a = 56; subroutine_1( $a ); print “A=$a\n”; } exit; subroutine_1( $ ) { $_[0] = 99; # Alter the first element of the @_ array }

The elements of the @_ array are references to the variables in the calling code, so changing the value of $_[0] will change the variable $a in the example above. Therefore the value printed will be A=99. This form is not recommended since it’s confusing and inconsistent with normal usage.

2.2 Passing Arrays And Hashes To Subroutines Lists of values when passed to subroutines are flattened, so if you pass two lists to a subroutine, from the perspective of the subroutine itself this looks like one long list, i.e., the identity of the two lists is lost. Since this almost certainly isn’t what you want to achieve, pass the lists as references instead. This way the identities of the two (or more) lists is maintained. Here’s how to do this: MAIN: { my @list_1 = qw( Alpha Baker Charlie Delta ); my @list_2 = qw( Zulu Yankee Xray Whisky ); subroutine_1( \@list_1 , \@list_2 ); }

July 31, 2005

3 / 18

exit; subroutine_1( $$ ) { my ( $list_1_r , $list_2_r ) = @_; print $list_1_r->[ 1 ] , “ “ , $list2r_r->[ 3 ] , “\n”; }

The two arguments (which are themselves scalars) are references to the original lists so the subroutine can access the individual elements of the lists. Therefore the above example prints out “Baker Whisky”.

2.3 Returning One Or More Results From A Subroutine : Part 1 Use the wantarray function to see if a subroutine was called in scalar or list context. If the wantarray function returns TRUE then return a list, else return a scalar. Here’s how to do this: MAIN: { my @list = subroutine_1(); my $scalar = subroutine_1(); print “@list $scalar”; } exit; subroutine_1( $$ ) { if ( wantarray ) { return qw( one two three four five ); } else { return( “once I caught a fish alive\n” ); } }

The first call to subroutine_1 is in list context (the calling program expects a list to be returned). In subroutine_1 the wantarray function is evaluated and for this first call it will be TRUE, therefore subroutine_1 sends back a list of five things (the textual representation of the numbers one to five inclusive). The second call to subroutine_1 is in scalar context (the calling program expects a single thing to be returned). Now when the wantarray function is evaluated a single thing is returned (a string consisting of the text “once I caught a fish alive”. Note that you can also return information from a subroutine that is expected to be interpreted as a hash. If this is true then you should make sure that you return an even number of scalars (each pair of scalar’s will be used as a key/value pair in the resulting hash).

4 / 18

July 31, 2005

2.4 Returning One Or More Results From A Subroutine : Part 2 You want to return several scalars from a subroutine. Here’s how to do it; MAIN: { my @values = qw ( 6.32 7.88 9.54 12.83 17.99 31.36 18.25 ); my ( $mean , $median , $mode , $variance ) = statistics( @values ); # Code to print out results } exit; sub statistics { # Code to compute mean, median, mode, variance return( $mean , $median , $mode , $variance ); }

We arrange for the subroutine to return four scalar variables in a list, and we arrange for the receiving code to place those four returning values in that list, into another four scalar variables.

2.5 Making The Equivalent Of C Static Variables Sometimes you want to be able to create a variable in a subroutine that will maintain its value between subroutine calls. Here’s how to do this: MAIN: { my $tmp; $tmp = count(); print “Tmp = $tmp\n”; $tmp = count(); print “Tmp = $tmp\n”; } exit; BEGIN { my $count_value = 0; sub count() { $count_value++; return $count_value; } }

Place the subroutine definition(s) in a code block (subroutines are visible from everywhere regardless of how you “hide” them). The lexical variable $count_value is locally scoped to the July 31, 2005

5 / 18

code block its defined in and is therefore available to the subroutine count(). However, while normally a lexical variable will be destroyed once a code block finishes execution, in this case the compiler arranges for it to continue to exist since something is still referring to it (in technical terms the subroutine count() has incremented $count_value’s reference count, and that stops Perl from destroying it). The only problem is how to get an initial value of zero into the value of $count_value. This is done by placing all the code in a BEGIN block. Perl guarantees to execute all BEGIN blocks as soon as they are compiled, thus ensuring that the single line of code “my $count_value = 0” is executed before any call to the subroutine is made. The above code therefore prints out Tmp = 1 followed by Tmp = 2. Of course, there’s no reason why several subroutines cannot share a variable in this way to provide a globally accessed variable that cannot suffer from unintended side-effects. Here’s how: MAIN: { my $tmp; initialize( 37 ); $tmp = increment(); print “Tmp = $tmp\n”; $tmp = decrement(); print “Tmp = $tmp\n”; } exit; BEGIN { my $value = 0; sub initialize( $ ) { $value = shift @_; } sub increment() { $value++; return $value; } sub decrement() { $value--; return $value; } }

This is a very secure way to create something that can be accessed from anywhere in a controlled and predictable manner. The variable $value is secure from any unintended side-effects (or even intended ones) and can be initialized/incremented/decremented from anywhere (you could of course also add a read subroutine to just return the value). We’ve almost strayed into OO land here since we’ve created something that is encapsulated (the variable value) and can only be accessed via subroutine calls (equivalent of OO methods).

6 / 18

July 31, 2005

2.6 Implementing A Switch Statement One way is to download switch.pm from CPAN and use that, but that might not be an option for code you export to other sites. Here’s another way that’s self-contained: SWITCH: { if ( $condition == TRUE) { # Run some code next SWITCH; } if ( $some_other_condition == TRUE) { # Run some other code last SWITCH; } # Run some default code }

Here, SWITCH is a label (so each switch statement needs a different label and this is a drawback) while the last SWITCH piece of code is the equivalent of C’s break. Since this is a loop you can repeat it with next (all clauses except the last) , and end it with last (the last clause only).

2.7 Labels : Use Them Use labels to be explicit about where the commands next and last transfer you (and goto, but you’re never going to use goto, are you!). OUTER: { foreach my $item ( @item_list ) { INNER: { foreach my $object ( @object_list ) { # Code next OUTER if ( $some_condition

== TRUE );

# Code next INNER if ( $some_other_condition == TRUE ); } } } }

July 31, 2005

7 / 18

2.8 Labels : Don’t Use Them If you use labels it is always clear where you are transferring control to, but it is never clear at the transfer point (i.e., the actual label) where transfer of control has come from, and this makes it very hard to debug code – next and last with labels are just synonyms for goto (and you’re never going to use goto, are you!) On balance, use labels for SWITCH and one level loop operations.

3

Writing Efficient, Maintainable And Reusable Code

Package useful code into subroutines and then into modules and then share it with everyone. Install tools in: /design/rmc/tools/

and modules in: /design/rmc/tools/Perl_Modules/tool/dev/

and in both cases release them. Don’t forget to write documentation, ideally as POD (Perl has translators to generate man pages, html and PDF). Don’t reinvent the wheel. Since a lot of what we do involves reading and parsing files, and then writing some new file(s), use Netlist_Tools.pm in the Perl_Modules directory. These routines are debugged and work quite happily with files that are gigabytes in size and they’ll transparently gunzip any files that are gzipped even if you don’t know they’re gzipped. Don’t reinvent the wheel. Also, before you write a mega-thingy widget that will revolutionize human-kind, look on CPAN just in case someone else has beaten you to it (they probably have)! Don’t reinvent the wheel. If you’re writing code that makes several different tests on some data, put the most common tests before the less common ones. For example, if you’re testing a string in a loop like this: foreach my $line ( @very_large_file ) { if ( $line =~ m/\s*\#/ ) # Lines that are comments (start with a #) { next; } if ( $line =~ m/^$/ ) # Lines that are blank { next; } if ( $line =~ m/^\s+/ ) # Lines that contain leading white-space { next; } if ( $line =~ m/^\S+/ ) # Lines without leading white-space { # Code to process $line next; } }

8 / 18

July 31, 2005

and you run this code with a file containing 10 million lines of which 99.99% of the lines are not either comments, blank or start with white-space, then you’ll end up executing approximately 40 million tests. If you put the bottom most test (the test for lines without leading white-space) first, then this code will now run and execute about 10 million tests.

3.1 Writing Readable Code Here’s a very good question. Why do I need to observe and adhere to standards in programming in an environment like ours? My answer to this is to give an example: foreach $keyName (keys(%keys)) { foreach $hierName (keys(%{$keys{$keyName}{instances}})) { if(${$instances{$hierName}{type}} eq "key") { my $cellName = ${$instances{expandExpression($keyName, $hierName)}{cellName}}; if(exists($cellProperties{$cellName}{classless}{keyTerminals})) { foreach my $keyTermName (split(',', ${$cellProperties{$cellName}{classless}{keyTerminals}})) { if(exists($keys{$keyName}{instances}{$hierName}{$keyTermName})) { my $netName = expandExpression($keyName, ${$keys{$keyName}{instances}{$hierName}{$keyTermName}}); if(defined($packageTerms{$netName})) { if(!defined($ios{$netName}) || $ios{$netName} ne "-global") { if($packageTerms{$netName} eq $keyName) { keysWarn("Instance of cell with duplicated key names, cell $cellName, in $padName, duplicated key name is $keyTermName\n"); } else { keysWarn("Two or more keys connected to package terminal $netName, key $keyName$keyTermName and key ", $packageTerms{$netName}, "\n"); } } } $packageTerms{$netName} = $keyName; } } } } } } keysMessage("Checked ".scalar(keys(%packageTerms))." package terminals\n");

I’ve rendered it in a small font size to illustrate a point: the formatting has been preserved exactly as it was written, and this is a small fragment of a much larger code-base of well over 5000 lines of code just like this. And my point? I absolutely guarantee to you that one week after the above code was written, that the original author will not know all the nuances that went into it’s authorship. Any debugging exercise will be very difficult for that author, let alone someone who comes fresh to the task with responsibility to maintain this code once the originator has moved on. Therefore, style and readability and clarity matter.

3.1.1 Hints For Readable Code Line up items so that it’s easy to spot errors. For example, this works but isn’t acceptable: my my my my my

$lef_filename = undef; $log_filename = undef; $default_log_filename = "lefPortStrip.log"; $pin_names_r = []; $layer_names_r = [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename , $pin_names_r , $layer_names_r );

July 31, 2005

9 / 18

but this is: my my my my my

$lef_filename $log_filename $default_log_filename $pin_names_r $layer_names_r

= = = = =

undef; undef; "lefPortStrip.log"; []; [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename , $pin_names_r , $layer_names_r );

If you’re writing a complex “if” statement then line up the brackets: If ( ( $day == SUNDAY ) && ( $full_moon == TRUE ) && { $spring_equinox == TRUE ) ) { print “It’s Easter Sunday\n”; }

Use a 2 or 4 column indent and be consistent in its usage. Put the opening curly brace on the line after a keyword and lined up with the start of the keyword. A one-line BLOCK may be put on one line, including left- and right-brace. If ( $flag == TRUE) { $result = PI; $next_example = FALSE; }

Don’t omit the semicolon in a one-line BLOCK even though you can (in the above example it’s the semicolon after the “E” in FALSE. At some point it’s a certainty that you’ll change that one line block to a multi-line block by adding new commands. At that point the semicolon is needed and you’ll have to add it anyway. Don’t put space before the semicolon after a statement. Do put space both before and after a “,” when separating parameters and list items. Put space around most (all) operators. Put space around complicated subscripting code. Put blank lines between sections of code that do different things. Don’t put space between a function name and its opening parenthesis. Break long lines after an operator. 10 / 18

July 31, 2005

Omit redundant punctuation as long as clarity doesn't suffer.

3.2 Use Constants If values appear in code that are constants, define them as constants with “use constant”. It is an accepted convention that constants should appear in all UPPERCASE. use constant PI => 3.1415926; use constant E => 2.7182818; use constant A => 6.02E23; MAIN: { my $radius = 2.0; my $area = PI * $radius * $radius; }

3.3 Make The Use Of References Obvious If your code uses references, make sure that the variable names that are used are tagged with something that makes it obvious they’re references, like _r. If you do this consistently it then becomes obvious when you try to use something that is/is not a reference in a dereference operation. For example, in the following code it’s obvious that you should only be using the dereference operator (the ->) on a reference. my $array_r = []; # Create a reference to an empty list # and then later $array_r->[ 56 ] = PI;

While in the following example it should be obvious that something has gone wrong because the dereference operator is not being used on a reference (the _r is missing). my $number = 56; # and then later $number->[ 0 ] = get_random_integer();

3.4 Don’t Use Default Values When using a loop construct like foreach, don’t use the defaults allowed by Perl. I.e. it is allowable to say this: foreach ( @l ) {

July 31, 2005

11 / 18

print $_; }

Which doesn’t tell you much about what’s going on and why, whereas the far more readable: foreach my $book_title ( @library ) { print “$book_title\n”; }

tells you exactly what was/is intended. This will be more clear to others when they read your code and will be clearer to you when you come back to debug your code in a years time.

3.5 Distinguish Between for And foreach The Perl keywords, for and foreach are synonyms, so you can use either one to index through lists or index through values. Here are two examples of how you should use them: foreach my $name ( @friends ) { print “I have a friend called $name\n”; } for ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }

And here are two examples of how you should not use them: for my $name ( @friends ) { print “I have a friend called $name\n”; } foreach ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }

3.6 Common Sense Use meaningful variable and subroutine names. Don’t use variables with the names $a and $b. See the man page for sort() to understand why. Name variables using my (i.e., use lexical variables). Never use global variables and don’t be tempted in the heat of debugging to insert just one or two to get around a problem. 12 / 18

July 31, 2005

Use lots of comments. You’ll be amazed how quickly you’ll forget just what it was you were trying to express in your code a day, a week, a month, a year ago. Document functions and procedures. When in doubt use parentheses. Just because you can omit them doesn’t mean you should omit them. If your program is running for more than a few seconds, give your users some feedback. If you’re programming a GUI in PerlTk, use a progress bar. If your program is a command line driven program then always program a -help parameter to give users some idea of what the program does and what to type. Make the invocation of the program with no parameters display some help information. Give a user the option to get more help with a –help parameter. Allow default options. Make sure a user knows what they are, when he/she asks for help. Make error messages clear so a user knows what to fix when things don’t run the way they expect. Since many programs are often chained together or are run within a single controlling program, make sure all scripts return an error or success code. Error codes for success are always 0 (zero). If programs are designed to be chained together in a shell script, then follow the Unix philosophy of having programs that complete successfully return no output at all (i.e., they are silent). Here’s a way of setting up and using exit codes: # Exit codes : use constant EXIT_OKAY => 0; # Success use constant EXIT_BAD_ARGS => 1; # Failed with bad arguments # Later in your program if ( $number_of_arguments < 4 ) # Not enough arguments given ! { exit ( EXIT_BAD_ARGS ); } # And at the end of your program exit( EXIT_OKAY );

Always return a value from both your program and any subroutines in that program. If you don’t use an explicit return statement then the value returned is the result of last statement evaluated. This will change as you modify your code, and in particular since most code is added at the end of a program, the return value from what you’re currently writing will be changing what is seen by whatever wrapper is running your code. If it’s vital that your code not return a value, because, say, you want to indicate that an error occurred but it wasn’t a fatal error, then return undef. In Perl undef is a value that represents not defined. July 31, 2005

13 / 18

When you write Modules, remember that a module must always return a value of TRUE, so the last line of a Module should look like this. 1;

If you cut-and-paste code, then that code belongs in a subroutine.

4

Testing

If your code is destined to be used by others then you must test it. In particular keep a directory or folder with files that are read by your code, and write some scripts to run common cases. When you add new features or debug problems, make sure all the old tests are run so that you can prove that the modifications or additions haven’t caused unintended side-effects that cause old code to stop working correctly (in computer science parlance this is called regression testing).

5

Traps For The Unwary (Or, Things That Catch Everyone Out Eventually)

Remember to use == for numeric tests and eq for string tests. Don’t fall into the C trap of using = (assignment) when you mean == (comparison). Remember not to use = when you mean =~. Always start your Perl code with this: use warnings; use strict; use diagnostics;

All arrays count from 0, not 1. An array of size 20 has elements [0] to [19] inclusive. There isn’t an array item [20]. Hashes have no order, so you can’t use for or foreach with a hash. You also can’t index into them with []. If you need to iterate over a hash you’ll need to use keys and values.

6

Some Common Tasks And Possible Solutions

There are many things that occur over and over again in Perl programming. Here are some simple solutions to some of those tasks.

6.1 Adding A Command Line To A Program Ala Unix First solution: Use the Perl getopts module. The advantage of getopts is that it is all completely written for you. The disadvantage is that if it doesn’t do exactly what you want, then you either alter it or live with it. Second solution: Write your own routine. Here’s a template for it:

14 / 18

July 31, 2005

sub Parse_Command_Line_Arguments( $ ) { my ( $arguments_r ) = @_; my $usage

= “my_prog -input -output [-print_flag]”;

my $numargs = @$arguments_r; my $argument = undef; foreach $argument ( @$arguments_r ) { if ( $argument =~ m/\-help/i ) { # Help requested exit 0; } } if ( $numargs < 1 ) # Process all arguments { print ( "\nUsage: $usage\n" ); print ( "\nUse my_prog -h to get more help\n\n" ); exit 0; } my my my my

$next_arg $input_filename $output_filename $print_flag

= = = =

undef; undef; undef; FALSE;

while ( $numargs-- ) { $next_arg = shift( @$arguments_r ); SWITCH: { if ( $next_arg =~ m/^\-input/i ) { $input_filename = shift( @$arguments_r ); $numargs-- ; last SWITCH; } if ( $next_arg =~ m/^\-output/i ) { $output_filename = shift( @$arguments_r ); $numargs--; last SWITCH; } if ( $next_arg =~ m/^\-print_flag/i ) { $print_flag = TRUE; last SWITCH; } if ( $next_arg =~ m/^\-/i ) { croak( "Unknown command line switch $next_arg" ); }

July 31, 2005

15 / 18

} } } return ( $input_filename , $output_filename , $print_flag ); }

You can then call this routine like this: my ( $input_filename , $output_filename , $print_flag ) = Parse_Command_Line_Arguments( \@ARGV ); croak ( "Missing input filename" ) unless defined $input_filename; croak ( "Missing output filename" ) unless defined $output_filename;

6.2 Loading And Parsing A File You want to load and loop through all the lines of a file performing some programming tasks on some or all of the lines. You then want to write out a new file containing whatever manipulations you’ve done. Here’s a common way to do this: #!/usr/local/bin/perl use strict; use warnings; use diagnostics; use Carp; use Cwd; use Config; use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" ); use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ OS_SPECIFIC/$Config{archname}" ); use FindBin qw( $Bin ); use lib $Bin; use Netlist_Tools; MAIN: { my $file_r = Read_File( “BigFile.txt” ); foreach my $line ( @$file_r ) { # Fiddle with the line } Write_File( “NewFile.txt” , $file_r );

16 / 18

July 31, 2005

exit 0; }

This code will load one of (in this order) BigFile.txt, BigFile.txt.gz, BigFile.txt.gzip. If you specify an output filename in Write_File that is suffixed in either .gz or .gzip then the file will be compressed (with gzip) before it is written. A major advantage of Read_File is that not only will it transparently read in the file via gzip if necessary, all the lines are then formatted so that every line is in a list that can be iterated, and every line is guaranteed to have no white-space before the first non-white-space character. There will also be no white-space at the end of the line and all “words” on a line will be separated by exactly one space. If, alternatively, you want to create a new file based on some or all of the contents of an input file, you can re-write the body of the code in the previous program like this: use Netlist_Tools; MAIN: { my $in_file_r = Read_File( “BigFile.txt” ); my $out_file_r = []; foreach my $line ( @$file_r ) { # Inspect the line, generate new information from it. Write the # new information into a list like this: push @$out_file_r , “new stuff”; # Don’t add \n at the ends of lines } Write_File( “NewFile.txt” , $out_file_r ); exit 0; }

This will write out the contents of a list (@$out_file_r) which you build up piece-meal based on some or all of what you read from the original input file.

6.3 Interacting With The LSF Queuing Mechanism The LSF queuing system allows CPU intensive jobs to use the shared CPU resource of most of the machines in this building. Here’s how to interface to that queuing system while limiting yourself to a predetermined number of jobs and adding new jobs to the queue as old jobs complete: # # # # #

This example shows how to use ELDO for which we have 4 licenses. We’ll limit ourselves to use 2 of them. While we’re limiting ourselves here because of scarce license resource, the same code can be used to stop queues being flooded with jobs that are pending but consuming queue slots (and making yourself pretty damn unpopular).

July 31, 2005

17 / 18

my $running_jobs = 0; my $jobs_limit = 4; foreach my $file qw ( File_1.cir File_2.cir . . . File_98.cir File_99.cir ) { # Test the queue my $bjobs_output = `bjobs -q linux 2>&1`; my @bjobs_lines = split( /\n/ , $bjobs_output ); $running_jobs = scalar( @bjobs_lines ) - 1; if ( $running_jobs < $jobs_limit ) { my $command = "bsub -q linux \'eldo -nomail -queue -stver $file\'"; system( $command ); print "\nJob $command submitted\n"; } else { print "."; sleep 10; redo; } } exit;

18 / 18

July 31, 2005

Useful regular expressions -------------------------------Roman numbers m/^m*(d?c{0,3}|c[dm])(l?x{0,3}|x[lc])(v?i{0,3}|i[vx])$/i -------------------------------Swap first two words s/(\S+)(\s+)(\S+)/$3$2$1/ -------------------------------Keyword = Value m/(\w+)\s*=\s*(.*)\s*$/ # keyword is $1, value is $2 -------------------------------Line of at least 80 characters m/.{80,}/ -------------------------------MM/DD/YY HH:MM:SS m|(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)| -------------------------------Changing directories s(/usr/bin)(/usr/local/bin)g -------------------------------Expanding %7E (hex) escapes s/%([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge -------------------------------Deleting C comments (imperfectly) s{ /\* # Match the opening delimiter .*? # Match a minimal number of characters \*/ # Match the closing delimiter } []gsx; -------------------------------Removing leading and trailing whitespace s/^\s+//; s/\s+$//; -------------------------------Turning \ followed by n into a real newline s/\\n/\n/g; -------------------------------Removing package portion of fully qualified symbols s/^.*::// -------------------------------IP address m/^([01]?\d\d|2[0-4]\d|25[0-5])\.([01]?\d\d|2[0-4]\d|25[0-5])\. ([01]?\d\d|2[0-4]\d|25[0-5])\.([01]?\d\d|2[0-4]\d|25[0-5])$/; -------------------------------Removing leading path from filename s(^.*/)() -------------------------------Extracting columns setting from TERMCAP $cols = ( ($ENV{TERMCAP} || " ") =~ m/:co#(\d+):/ ) ? $1 : 80; -------------------------------Removing directory components from program name and arguments ($name = join(" ", map { s,^\S+/,,; $_ } ($0 @ARGV)); -------------------------------Checking your operating system die "This isn't Linux" unless $^O =~ m/linux/i; -------------------------------Joining continuation lines in multiline string

s/\n\s+/ /g -------------------------------Extracting all numbers from a string @nums = m/(\d+\.?\d*|\.\d+)/g; -------------------------------Finding all-caps words @capwords = m/(\b[^\Wa-z0-9_]+\b)/g; -------------------------------Finding all-lowercase words @lowords = m/(\b[^\WA-Z0-9_]+\b)/g; -------------------------------Finding initial-caps word @icwords = m/(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)/; -------------------------------Finding links in simple HTML @links = m/]+?HREF\s*=\s*["']?([^'" >]+?)[ '"]?>/sig; -------------------------------Finding middle initial in $_ $initial = m/^\S+\s+(\S)\S*\s+\S/ ? $1 : ""; -------------------------------Changing inch marks to quotes s/"([^"]*)"/``$1''/g -------------------------------Extracting sentences (two spaces required) { local $/ = ""; while (<>) { s/\n/ /g; s/ {3,}/ /g; push @sentences, m/(\S.*?[!?.])(?= |\Z)/g; } } -------------------------------YYYY-MM-DD m/(\d{4})-(\d\d)-(\d\d)/ # YYYY in $1, MM in $2, DD in $3 -------------------------------North American telephone numbers m/ ^ (?: 1 \s (?: \d\d\d \s)? # 1, or 1 and area code | # ... or ... \(\d\d\d\) \s # area code with parens | # ... or ... (?: \+\d\d?\d? \s)? # optional +country code \d\d\d ([\s\-]) # and area code ) \d\d\d (\s|\1) # prefix (and area code) \d\d\d\d # exchange $ /x -------------------------------Exclamations m/\boh\s+my\s+gh?o(d(dess(es)?|s?)|odness|sh)\b/i -------------------------------Extracting lines regardless of line terminator push(@lines, $1) while ($input =~ s/^([^\012\015]*)(\012\015?|\015\012?)//);

Pattern Matching Operators =~ matches, or, contains =! does not match, or, does not contain The m// Operator (Matching) EXPR =~ m/PATTERN/cgimosx search string in EXPR for PATTERN EXPR =~ /PATTERN/cgimosx as above but once only match EXPR =~ ?PATTERN?cgimosx as above, no variable interpolation m/PATTERN/cgimosx search in $_ /PATTERN/cgimosx search in $_ ?PATTERN?cgimosx search in $_ /i ignore alphabetic case /m let ^ and $ match next to embedded /n /s let . match newline and ignore deprecated $* /x ignore (most) whitespace and permit comments in pattern /o compile pattern once only /g globally find all matches /cg allow continued search after failed /g match MODIFIERS The s/// Operator (Substitution) LVALUE =~ s/PATTERN/REPLACEMENT/egimosx s/PATTERN/REPLACEMENT/egimosx /i ignore alphabetic case (when matching) /m let ^ and $ match next to embedded /n /s let . match newline and ignore deprecated $* /x ignore (most) whitespace and permit comments in pattern /o compile pattern once only /g replace globally, that is, all occurrences /e evaluate the right side as an expression MODIFIERS The tr/// Operator (Transliteration) LVALUE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds tr/SEARCHLIST/REPLACEMENTLIST/cds y/// is a synonym for tr/// /c Complement SEARCHLIST /d Delete found but unreplaced characters /s Squash duplicate replaced characters

MODIFIERS

Extended Regex Sequences (?#...) comment, discard (?:...) cluster-only parentheses, no capturing (?imsx-imsx) enable/disable pattern modifiers (?imsx-imsx:...) cluster-only parentheses plus modifiers (?=...) true if lookahead assertion succeeds (?!...) true if lookahead assertion fails (?<=...) true if lookbehind assertion succeeds (?...) match nonbacktracking subpattern (?{...}) execute embedded Perl code (??{...}) match regex from embedded Perl code (?(...)...|...) match with if-then-else pattern (?(...)...) match with if-then pattern Classic Character Classes /d digit [0-9] /D non-digit [^0-9] /s whitespace [ /t/n/r/f] /S non-whitespace [^ /t/n/r/f] /w word [a-zA-Z0-9_] /W non-word [^a-zA-Z0-9_]

Meta-characters / | ( ) [ { ^ $ * + ? . /... de-meta next meta, or, meta next non-meta character ...|... alternation (match one of many) (...) grouping (treat as a unit). Patterns are stored in $1, $2, etc. after match (\1 \2 inside match) (?:PATTERN) group/cluster but don't capture [...] character class (match one character from a set) ^ true at beginning of string (or with \m after newline) \A true at begining of string . match one character (except newline, normally) $ true at end of string (or with /m after newline) \z true at end of string \Z true before newline at end of string, otherwise at end of string \b match at word boundary (i.e. /w/W or /W/w) \B match at not a word boundary (i.e. \w\w or \W\W) \G continue from where the last match ended

Regex Quantifiers * match 0 or more times (maximal) + match 1 or more times (maximal) ? match 1 or 0 times (maximal) {COUNT} match exactly COUNT times {MIN,} match at least MIN times (maximal) {MIN,MAX} match at least MIN but not more than MAX times (maximal) *? match 0 or more times (minimal) +? match 1 or more times (minimal) ?? match 0 or 1 time (minimal) {MIN,}? match at least MIN times (minimal) {MIN,MAX}? match at least MIN but not more than MAX times (minimal) REGULAR EXPRESSION & (S)PRINTF SUMMARY

Regex Metasymbols \0 the null character \NNN octal character \n nth captured string \a alarm character \cX control X \C match C char \e ASCII esc \E end case (\L\U or \Q) \f form-feed \l lowercase character \L lowercase until \E \n newline \Q de-meta until \E \r return \t TAB \u titlecase character \U uppercase until \E \x match hex character

Regexp Grabbag assign and substitute in one go swap two words keyword = value line of at least N characters changing directories remove & compress whitespace turn / followed n into a real newline integer integer + suffix float float + suffix filename

$arg $arg $arg $arg $arg

=~ =~ =~ =~ =~

printf/sprintf %% percent sign %c character %s string %d signed dec int %u unsigned dec int %o unsigned oct int %x unsigned hex int %e float scientific %f float decimal %g float %e or %f %X like %x but UC %E like %e uses 'E' %G like %g uses 'G' %b unsigned binary %p a pointer %n perl specific

($copy = $original) =~ s/this/that/; s/(/S+)(/s+)(/S+)/$3$2$1/; m/(/w+)/s*=/s*(.*)/s*$/; m/.{80,}/; s(/usr/bin)(/usr/local/bin)g; s/^/s+//; s//s+$//; s/\s+/ /; s///n//n/g; milli,micro,nano,pico,femto,atto

m/^\d+$/; m/^\d+[munpfa]$/; m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?$/; m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?[munpfa]$/; m/[A-Za-z_][\d\w_\.]*$/;

Stepping and Running s - single step (single step subroutines too) s EXPR - single step an expression (inc. subroutines & functions) n - single step (don't single step subroutines) n EXPR - single step an expression (not subroutines & functions) <ENTER> - repeat previous s or n command . - set internal debugger pointer to last line executed and print line r - continue until the currently executing subroutine returns Breakpoints b b b b b b b b b b d d D L c c

- set a breakpoint on the line about to execute LINE [b 73] - set a breakpoint before LINE CONDITION [b $x>10] - set breakpoint on next line with condition LINE CONDITION [b 40 $a>12]- set a breakpoint on LINE with condition SUBNAME [b Load_File] - set breakpoint before first line of subroutine SUBNAME CONDITION - set breakpoint before first line of subroutine with condition postpone SUBNAME - set breakpoint at first line of subroutine after compilation postpone SUBNAME CONDITION - set breakpoint at first line of subroutine after compilation with condition compile SUBNAME - set breakpoint on first statement to be executed after SUBNAME is compiled load FILENAME - set breakpoint on first executed line in file - delete breakpoint on the line about to execute LINE [d 224] - delete breakpoint on line LINE - delete all breakpoints - list all breakpoints and actions - continue execution LINE [c 76] - continue execution (set one-time breakpoint on line LINE)

Tracing T t t EXPR W W EXPR p p EXPR x x EXPR V V PKG V PKG VARS X X VARS H H-NUMBER

-

produce a stack backtrace trace the program trace an expression delete all watchpoints add expression as global watchpoint print print an expression pretty-print (will recursively print data structures) pretty-print an expression display all variables in current package display all variables in a package display named variables in a package same as V in CURRENTPACKAGE same as V in CURRENTPACKAGE show all commands show last NUMBER commands

Always: use warnings; use strict; use diagnostics;

Actions and Command Execution a - delete action on current line a COMMAND - add action to current line a LINE - delete action on line a LINE COMMAND - add action to line A - delete all actions < - delete all actions before prompt < ? - show action before prompt < EXPR - add action before prompt << EXPR - add another action before prompt > - delete all actions after prompt > ? - show action after prompt > EXPR - add action after prompt >> EXPR - add another action after prompt { - like < but a debugger command { ? - like < ? but a debugger command { COMMAND - like < COMMAND but a debugger command {{ COMMAND - like << COMMAND but a debugger command ! - repeat previous command ! NUMBER - repeat a numbered command ! -NUMBER - repeat command counting backward ! PATTERN - repeat command containing PATTERN !! CMD - run external command in sub-process | - pipe external command to $ENV{PAGER} | DBCMD - pipe debugger command DBCMD || PERLCMD - pipe perl command PERLCMD Locating Code l - list the next few lines of code l LINE - list code from line LINE l MIN+INCR - list INCR+1 lines of code code from line MIN l MIN-MAX - list code from lines MIN to MAX - list a previous few lines w - list a window (a few lines) around the current line w LINE - list a window (a few lines) around line LINE f FILENAME - view a different program or eval statement /PATTERN/ - search forward for PATTERN. / repeats previous search ?PATTERN? - search backward for PATTERN. ? repeats previous search S - list all subroutine names S PATTERN - list all subroutine names matching PATTERN S !PATTERN - list all subroutine names not matching PATTERN Miscellaneous Commands q or ^ - quit the debugger R - restart the debugger = or = ALIAS - list all aliases or a named ALIAS = ALIAS VALUE - create an alias man - show man page for man or a named MANPAGE O - show all options O OPTION - set listed options to 1 O OPTION? - show listed options O OPTION=VALUE - set an option to a value

Pattern Matching Operators =~ matches, or, contains =! does not match, or, does not contain The m// Operator (Matching) EXPR =~ m/PATTERN/cgimosx search string in EXPR for PATTERN EXPR =~ /PATTERN/cgimosx as above but once only match EXPR =~ ?PATTERN?cgimosx as above, no variable interpolation m/PATTERN/cgimosx search in $_ /PATTERN/cgimosx search in $_ ?PATTERN?cgimosx search in $_ /i ignore alphabetic case /m let ^ and $ match next to embedded /n /s let . match newline and ignore deprecated $* /x ignore (most) whitespace and permit comments in pattern /o compile pattern once only /g globally find all matches /cg allow continued search after failed /g match MODIFIERS The s/// Operator (Substitution) LVALUE =~ s/PATTERN/REPLACEMENT/egimosx s/PATTERN/REPLACEMENT/egimosx /i ignore alphabetic case (when matching) /m let ^ and $ match next to embedded /n /s let . match newline and ignore deprecated $* /x ignore (most) whitespace and permit comments in pattern /o compile pattern once only /g replace globally, that is, all occurrences /e evaluate the right side as an expression MODIFIERS The tr/// Operator (Transliteration) LVALUE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds tr/SEARCHLIST/REPLACEMENTLIST/cds y/// is a synonym for tr/// /c Complement SEARCHLIST /d Delete found but unreplaced characters /s Squash duplicate replaced characters

MODIFIERS

Extended Regex Sequences (?#...) comment, discard (?:...) cluster-only parentheses, no capturing (?imsx-imsx) enable/disable pattern modifiers (?imsx-imsx:...) cluster-only parentheses plus modifiers (?=...) true if lookahead assertion succeeds (?!...) true if lookahead assertion fails (?<=...) true if lookbehind assertion succeeds (?...) match nonbacktracking subpattern (?{...}) execute embedded Perl code (??{...}) match regex from embedded Perl code (?(...)...|...) match with if-then-else pattern (?(...)...) match with if-then pattern Classic Character Classes /d digit [0-9] /D non-digit [^0-9] /s whitespace [ /t/n/r/f] /S non-whitespace [^ /t/n/r/f] /w word [a-zA-Z0-9_] /W non-word [^a-zA-Z0-9_]

Meta-characters / | ( ) [ { ^ $ * + ? . /... de-meta next meta, or, meta next non-meta character ...|... alternation (match one of many) (...) grouping (treat as a unit). Patterns are stored in $1, $2, etc. after match (\1 \2 inside match) (?:PATTERN) group/cluster but don't capture [...] character class (match one character from a set) ^ true at beginning of string (or with \m after newline) \A true at begining of string . match one character (except newline, normally) $ true at end of string (or with /m after newline) \z true at end of string \Z true before newline at end of string, otherwise at end of string \b match at word boundary (i.e. /w/W or /W/w) \B match at not a word boundary (i.e. \w\w or \W\W) \G continue from where the last match ended

Regex Quantifiers * match 0 or more times (maximal) + match 1 or more times (maximal) ? match 1 or 0 times (maximal) {COUNT} match exactly COUNT times {MIN,} match at least MIN times (maximal) {MIN,MAX} match at least MIN but not more than MAX times (maximal) *? match 0 or more times (minimal) +? match 1 or more times (minimal) ?? match 0 or 1 time (minimal) {MIN,}? match at least MIN times (minimal) {MIN,MAX}? match at least MIN but not more than MAX times (minimal) REGULAR EXPRESSION & (S)PRINTF SUMMARY

Regexp Grabbag assign and substitute in one go swap two words keyword = value line of at least N characters changing directories remove & compress whitespace turn / followed n into a real newline integer integer + suffix float float + suffix filename

Stepping and Running s - single step (single step subroutines too) s EXPR - single step an expression (inc. subroutines & functions) n - single step (don't single step subroutines) n EXPR - single step an expression (not subroutines & functions) <ENTER> - repeat previous s or n command . - set internal debugger pointer to last line executed and print line r - continue until the currently executing subroutine returns Breakpoints b b b b b b b b b b d d D L c c

- set a breakpoint on the line about to execute LINE [b 73] - set a breakpoint before LINE CONDITION [b $x>10] - set breakpoint on next line with condition LINE CONDITION [b 40 $a>12]- set a breakpoint on LINE with condition SUBNAME [b Load_File] - set breakpoint before first line of subroutine SUBNAME CONDITION - set breakpoint before first line of subroutine with condition postpone SUBNAME - set breakpoint at first line of subroutine after compilation postpone SUBNAME CONDITION - set breakpoint at first line of subroutine after compilation with condition compile SUBNAME - set breakpoint on first statement to be executed after SUBNAME is compiled load FILENAME - set breakpoint on first executed line in file - delete breakpoint on the line about to execute LINE [d 224] - delete breakpoint on line LINE - delete all breakpoints - list all breakpoints and actions - continue execution LINE [c 76] - continue execution (set one-time breakpoint on line LINE)

Tracing T t t EXPR W W EXPR p p EXPR x x EXPR V V PKG V PKG VARS X X VARS H H-NUMBER

-

produce a stack backtrace trace the program trace an expression delete all watchpoints add expression as global watchpoint print print an expression pretty-print (will recursively print data structures) pretty-print an expression display all variables in current package display all variables in a package display named variables in a package same as V in CURRENTPACKAGE same as V in CURRENTPACKAGE show all commands show last NUMBER commands

Always: use warnings; use strict; use diagnostics;

Regex Metasymbols \0 the null character \NNN octal character \n nth captured string \a alarm character \cX control X \C match C char \e ASCII esc \E end case (\L\U or \Q) \f form-feed \l lowercase character \L lowercase until \E \n newline \Q de-meta until \E \r return \t TAB \u titlecase character \U uppercase until \E \x match hex character

$arg $arg $arg $arg $arg

=~ =~ =~ =~ =~

printf/sprintf %% percent sign %c character %s string %d signed dec int %u unsigned dec int %o unsigned oct int %x unsigned hex int %e float scientific %f float decimal %g float %e or %f %X like %x but UC %E like %e uses 'E' %G like %g uses 'G' %b unsigned binary %p a pointer %n perl specific

($copy = $original) =~ s/this/that/; s/(/S+)(/s+)(/S+)/$3$2$1/; m/(/w+)/s*=/s*(.*)/s*$/; m/.{80,}/; s(/usr/bin)(/usr/local/bin)g; s/^/s+//; s//s+$//; s/\s+/ /; s///n//n/g; milli,micro,nano,pico,femto,atto

m/^\d+$/; m/^\d+[munpfa]$/; m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?$/; m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?[munpfa]$/; m/[A-Za-z_][\d\w_\.]*$/;

Actions and Command Execution a - delete action on current line a COMMAND - add action to current line a LINE - delete action on line a LINE COMMAND - add action to line A - delete all actions < - delete all actions before prompt < ? - show action before prompt < EXPR - add action before prompt << EXPR - add another action before prompt > - delete all actions after prompt > ? - show action after prompt > EXPR - add action after prompt >> EXPR - add another action after prompt { - like < but a debugger command { ? - like < ? but a debugger command { COMMAND - like < COMMAND but a debugger command {{ COMMAND - like << COMMAND but a debugger command ! - repeat previous command ! NUMBER - repeat a numbered command ! -NUMBER - repeat command counting backward ! PATTERN - repeat command containing PATTERN !! CMD - run external command in sub-process | - pipe external command to $ENV{PAGER} | DBCMD - pipe debugger command DBCMD || PERLCMD - pipe perl command PERLCMD Locating Code l - list the next few lines of code l LINE - list code from line LINE l MIN+INCR - list INCR+1 lines of code code from line MIN l MIN-MAX - list code from lines MIN to MAX - list a previous few lines w - list a window (a few lines) around the current line w LINE - list a window (a few lines) around line LINE f FILENAME - view a different program or eval statement /PATTERN/ - search forward for PATTERN. / repeats previous search ?PATTERN? - search backward for PATTERN. ? repeats previous search S - list all subroutine names S PATTERN - list all subroutine names matching PATTERN S !PATTERN - list all subroutine names not matching PATTERN Miscellaneous Commands q or ^ - quit the debugger R - restart the debugger = or = ALIAS - list all aliases or a named ALIAS = ALIAS VALUE - create an alias man or man MANPAGE - show man page for man or a named MANPAGE O - show all options O OPTION - set listed options to 1 O OPTION? - show listed options O OPTION=VALUE - set an option to a value

Advanced Perl Style September 2005

A Standard Header  This works in Bristol. #!/usr/local/bin/perl use strict; use warnings; use diagnostics;

Preamble

use Carp; use Cwd; use Config;

Some standard modules

use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" ); use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ OS_SPECIFIC/$Config{archname}" );

Extend lib path

use FindBin qw( $Bin ); use lib $Bin;

Current directory

use Netlist_Tools;

Site specific

 There are other binary invocations that use “eval’ with some “magic”.

The magic #! line works for all machines on-site, regardless of whether they are SunOS (Solaris) or Linux based. We always use strict and warnings. Diagnostics are useful for less experienced programmers but if omitted can be added on a command line invocation with Mdiagnostics. Carp is the standard blame shifter (makes errors show up in client code rather then in your code). Cwd is a platform independent way of finding the current working directory. Config is used to allow programs to transparently load precompiled code (C, C++ etc.) on different binary platforms. FindBin allows a program to find out from what directory it is being run and to add that directory to Perl’s path. Netlist_Tools are site specific tools.

Program Structure  Structure your program in the same way you would structure a C program. #!/usr/local/bin/perl use strict; use warnings; use diagnostics;

Standard Header

sub subroutine_1( $$$ );

Forward Declarations

MAIN: { my $variable_1 = 27;

Main Program

# Program code – equivalent of C’s main() } exit;

Exit

sub subroutine_1( $$$ ) { # Subroutine body – can’t see $variable_1 unless it was passed as a # parameter in the subroutine call. }

Subroutines

By placing all the code for your program into subroutines and one top-level code block (here called “Main Program”), we can enforce the scope of all variable declarations and reduce or eliminate side-effects. Note that the top-level code block is headed by a label (MAIN:) but this is optional, and the name of theblock can be anything (I’ve called it MAIN to lull C programmers into a false sense of security).

If You Must Use Global variables  Make all global variables, package variables. #!/usr/local/bin/perl use strict; use warnings; use diagnostics;

Standard Header

sub subroutine_1();

Forward Declarations

$main::count = 56;

Global Declarations

MAIN: { $main::count = 27;

Write

subroutine_1(); } exit; sub subroutine_1() { print “The value of count is $main::count\n”; }

Read

There really isn’t any good reason to use global variables in the sense shown above. The problem is that the global variable is seen by all the subroutines that follow it because its scope is file scope. Therefore any subroutine can modify it and cause other subroutines that also see the variable to change their behavior - this isn’t usually what is intended.

Subroutine Parameters - I  Three choices: Pass by value, by reference, or by value in a hash (next slide). subroutine_1( $$$ ) { my ( $var_1 , $var_2 , $var_3 ) = @_; # Subroutine code goes here. $var_1 etc are private to this code

The Right Way (Value)

}

MAIN: { my $a = 56; subroutine_1( $a ); print “A=$a\n”; }

The Wrong Way (Reference)

exit; subroutine_1( $ ) { $_[0] = 99; # Alter the first element of the @_ array }

Of the ways to pass parameters to a subroutine, the best way (the correct way) is to pass them by value. This is done by copying all the parameters into local variables (lexical variables) at the start of the subroutine. Make this the first thing that any subroutine does. Then, if you change the value of any of the variables then it doesn’t affect the value of that variable in the code that called your subroutine. If you do want to change the value of one of the input parameters then you can pass by reference (option 1), or you can return a new value for the variable as a return value from the subroutine and assign it back to the corresponding variable in the calling code (option 2). Option 1 corresponds to the way you might choose to do this in C. Option 2 is the correct wy to do this in Perl. Note: option 1 and option 2 DO NOT refer to the two sections of code above.

Subroutine Parameters - II  Pass by value in a hash.  Replaces ordering (bad) by naming (good). sub format_line { my ( $args_r ) = @_; $args_r->{ justify } = 0 unless( exists( $args_r->{ justify } ) ); my $gap my $left my $right

Default

= $args_r->{ cols } - length $args_r->{ text }; = $args_r->{ justify } ? int( $gap / 2 ) : 0; = $gap - $left;

return $args_r->{ filler } x $left . $args_r->{ text } . $args_r->{ filler } x $right; }

Creates a reference to a hash

# Then later . . . foreach my $line ( @lines ) { $line = padded( { text => $line , cols => 20 , justify => 1 , filler => SPACE } ); }

If we pass values in a hash then we replace positional information by name information. We can optionally set up default values in the subroutine so that any missing parameters do not cause the subroutine to fail. See next slide.

Passing Arrays And Hashes To Subroutines  Pass arrays and hashes as references: MAIN: { my @list_1 = qw( Alpha Baker Charlie Delta ); my @list_2 = qw( Zulu Yankee Xray Whisky ); subroutine_1( \@list_1 , \@list_2 );

Pass As References

} exit; subroutine_1( $$ ) { my ( $list_1_r , $list_2_r ) = @_; print $list_1_r->[ 1 ] , “ “ , $list2r_r->[ 3 ] , “\n”;

Copy To Lexicals Use As References

}

Note a subtlety: We’re passing references here to make our program fast. If the lists that the references point to are large, then we don’t end up copying those large lists via the stack. We localise the references into subroutine_1 with the my statement, but we can still change any value in the lists that the references point to, by simply running the code as shown with list_1_r->[ 1 ] on the left-hand side of an assignment. In this respect we’ve exactly emulated C where we’ve called a subroutine with a const pointer - you can’t change the pointer but you can change the thing it’s pointing at. We’ve also violated our “Option 2” rule from 2 slides back “Subroutines I”.

Returning Results From Subroutines - I  Subroutines can decide what to return based on context: MAIN: { my @list = subroutine_1(); my $scalar = subroutine_1(); print “@list $scalar”; } exit; subroutine_1( $$ ) { if ( wantarray ) { return qw( one two three four five ); } else { return( “once I caught a fish alive\n” ); } }

Subroutines can return data in context, that is, subroutines can be made to know how they were called: in list context or in scalar context.

Returning Results From Subroutines - II  You want to return several scalars from a subroutine: MAIN: { my @values = qw ( 6.32 7.88 9.54 12.83 17.99 31.36 18.25 ); my ( $mean , $median , $mode , $variance ) = statistics( @values ); # Code to print out results } exit; sub statistics { # Code to compute mean, median, mode, variance return( $mean , $median , $mode , $variance ); }

If you want to return more than one thing from a subroutine, then return a list. You can then assign that list to another list in the calling code. Note that this can be error prone (you need to get the right number and order of variables with no language assistance). You could return a hash with named results, but you then run the risk (especially if you pass subroutine parameters in a hash as well) of turning each subroutine call into something with more overhead than code.

The Equivalent Of C Static Variables - I  Sometimes you want to be able to create a variable in a subroutine that will maintain its value between subroutine calls. Here’s how to do this: MAIN: { my $tmp; $tmp = count(); print “Tmp = $tmp\n”; $tmp = count(); print “Tmp = $tmp\n”; } exit; BEGIN { my $count_value = 0; sub count() { $count_value++; return $count_value; } }

Subroutine names are globally visible, so even though count() is buried one level down everything/anything that needs to call it can do so. However, with code written as shown, count() can access the variable named $count_value but nothing else in the program can. It’s a lexical variable and not a package variable (so you can’t say $main::count_value because that isn’t the way to access this particular variable) and the fact that count() is referring to it will make sure that perl keeps its reference count non-zero (so it is persistent and exists for the lifetime of the program). A long as we make the block in which it is defined a BEGIN block then it will be initialised by Perl before any of your code starts to run.

The Equivalent Of C Static Variables - II  Several subroutines sharing common access to provide a “global” variable that cannot suffer from unintended side-effects. MAIN: { my $tmp; initialize( 37 ); $tmp = increment(); print “Tmp = $tmp\n”; $tmp = decrement(); print “Tmp = $tmp\n”; } exit; BEGIN { my $value = 0; sub initialize( $ ) { $value = shift @_; } sub increment() sub decrement()

{ $value++; return $value; } { $value--; return $value; }

}

This is a very secure way to create something that can be accessed from anywhere in a controlled and predictable manner. The variable $value is secure from any unintended side-effects (or even intended ones) and can be initialized/incremented/decremented from anywhere (you could of course also add a read subroutine to just return the value). We’ve almost strayed into OO land here since we’ve created something that is encapsulated (the variable value) and can only be accessed via subroutine calls (equivalent of OO methods).

Implementing A SWITCH Statement  Perl doesn’t have a SWITCH statement. Here’s how to code one: SWITCH: { if ( $condition == TRUE) { # Run some code next SWITCH; } if ( $some_other_condition == TRUE) { # Run some other code last SWITCH; } # Run some default code }

Here, SWITCH is a label (so each switch statement needs a different label and this is a drawback) while the last SWITCH piece of code is the equivalent of C’s break. Since this is a loop, you can repeat it with next (all clauses except the last) , and end it with last (the last clause only).

Labels - Use Them/Don’t Use Them  Labels are, by convention, UPPERCASE. OUTER: { foreach my $item ( @item_list ) { INNER: { foreach my $object ( @object_list ) { # Code next OUTER if ( $some_condition

== TRUE );

# Code next INNER if ( $some_other_condition == TRUE ); } } } }

Use labels to be explicit about where the commands next and last transfer you (and goto, but you’re never going to use goto, are you!). If you use labels it is always clear where you are transferring control to, but it is never clear at the transfer point (i.e., the actual label) where transfer of control has come from, and this makes it very hard to debug code – next and last with labels are just synonyms for goto (and you’re never going to use goto, are you!) On balance, use labels for SWITCH and one level loop operations.

Writing Efficient, Maintainable And Useable Code - I  Package useful code into subroutines/modules and then share it.  Install tools in:  /design/rmc/tools/

 Install modules in:  /design/rmc/tools/Perl_Modules/tool/dev/

    

Add comments, lots of comments. Format your code so it is readable. Write documentation. Use Netlist_Tools.pm Look on CPAN.

 Don’t reinvent the wheel.

Writing Efficient, Maintainable And Useable Code - II  Put tests in the “right” order. foreach my $line ( @very_large_file ) { if ( $line =~ m/\s*\#/ ) # Lines that are comments (start with a #) { next; } if ( $line =~ m/^$/ ) # Lines that are blank { next; } if ( $line =~ m/^\s+/ ) # Lines that contain leading white-space { next; } if ( $line =~ m/^\S+/ ) # Lines without leading white-space { # Code to process $line

The common case, so this should go

next; } }

 Don’t reinvent the wheel.

If you’re writing code that makes several different tests on some data, put the most common tests before the less common ones. If you run the code on the slide, with a file containing 10 million lines, of which 99.99% of the lines are not either comments, blank, or start with white-space, then you’ll end up executing approximately 40 million tests. If you put the bottom most test (the test for lines without leading white-space) first, then this code will now run and execute about 10 million tests.

Hints For Readable Code - I  This is not the way to do it … my my my my my

$lef_filename = undef; $log_filename = undef; $default_log_filename = lefPortStrip.log"; $pin_names_r = []; $layer_names_r = [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename , pin_names_r , $layer_names_r );

 But this is … my my my my my

$lef_filename $log_filename $default_log_filename $pin_names_r $layer_names_r

= = = = =

undef; undef; "lefPortStrip.log"; []; [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename , $pin_names_r , $layer_names_r );

Note how much easier it would be to spot the missing opening quote and the missing $ sign on lines 3 and 8 of the upper example.

Hints For Readable Code - II  If you’re writing a complex “if” statement, line up the brackets: If ( ( $day == SUNDAY ) && ( $full_moon == TRUE ) && { $spring_equinox == TRUE ) ) { print “It’s Easter Sunday\n”; }

 Use a 2 or 4 column indent and be consistent in its usage.  Put the opening curly brace on the line after a keyword and lined up with the start of the keyword.  A one-line BLOCK may be put on one line, including left- and right-brace. If ( $flag == TRUE) { $result = PI; $next_example = FALSE; }

 Do put space both before and after a “,” when separating parameters and list items.  Do put space around most (all) operators.  Do put space around complicated subscripting code.

Don’t forget the semicolon in the one-line block case (the semicolon after the E in FALSE). it is optional, but it shouldn’t be.

Hints For Readable Code - III  Do put blank lines between sections of code that do different things.  Do break long lines after an operator.  Do omit redundant punctuation as long as clarity doesn't suffer.  Don’t put space before the semicolon after a statement.  Don’t put space between a function name and its opening parenthesis.

Using Constants  This is far more clear: If ( ( $day == SUNDAY ) && ( $full_moon == TRUE ) && { $spring_equinox == TRUE ) ) { print “It’s Easter Sunday\n”; }

 than this: If ( ( $day == 6 ) && ( $full_moon == 1 ) && { $spring_equinox == 1 ) ) { print “It’s Easter Sunday\n”; }

 Using constants that are all UPPERCASE is a common convention.

You use constants with the, use constant CONSTANT_NAME => value; pragma. For example; use constant PI => 3.1415926535;

Make The Use Of References Obvious  Tag all references with “_r”: my $array_r = []; # Create a reference to an empty list # and then later $array_r->[ 56 ] = PI;

 While it should be obvious that this won’t compile: my $number = 56; # and then later $number->[ 0 ] = get_random_integer();

If your code uses references, make sure that the variable names that are used are tagged with something that makes it obvious they’re references, like _r. If you do this consistently it then becomes obvious when you try to use something that is/is not a reference in a dereference operation. For example, in the code above it’s obvious that you should only be using the dereference operator (the ->) on a reference.

Avoid Using Default Values  Avoid $_ and its cousins.  You could program this: foreach ( @_ ) { print; # By default this statement will print $_ }

 but this is far better: foreach my $book_title ( @library ) { print “$book_title\n”; }

When using a loop construct like foreach, don’t use the defaults allowed by Perl. I.e. it is allowable to say remarkably little, (that doesn’t tell you much about what’s going on and why). The second example tells you exactly what was/is intended. Using default values leads to concise code that can be very difficult to read (even if *you* wrote it). Don’t assume that your code will be debugged by you or that the person debugging it will know what all the default values are. Keep it clear. Keep it simple. It’s not the obfuscated Perl contest.

Distinguish Between For And Foreach  The right way: foreach my $name ( @friends ) { print “I have a friend called $name\n”; } for ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }

Use foreach with lists

Use for with indexes

 The wrong way: for my $name ( @friends ) { print “I have a friend called $name\n”; } foreach ( my $count = 0 ; $count <= 10 ; $count++ ) { print “Count = $count\n”; }

The Perl keywords, for and foreach are synonyms, so you can use either one to index through lists or index through values. However, you will confuse others if you use them the wrong way around (foreach with an index or for with a list).

Use Common Sense - I       

Do use meaningful variable and subroutine names. Do use lexical variables. Do use lots of comments. Do document functions and procedures. Don’t use global variables. When in doubt use parentheses (which is always). Give your users some feedback.  If you’re programming a GUI in PerlTk, use a progress bar.

 Always program a -help parameter.  Invoking a program with no parameters should display some help information.  Allow default options.  Make sure a user knows what they are, when he/she asks for help.

 Make error messages clear.  Use exit codes for chained scripts.

Use meaningful variable and subroutine names. Don’t use variables with the names $a and $b. See the man page for sort() to understand why. Name variables using my (i.e., use lexical variables). Never use global variables and don’t be tempted in the heat of debugging to insert just one or two to get around a problem. Use lots of comments. You’ll be amazed how quickly you’ll forget just what it was you were trying to express in your code a day, a week, a month, a year ago. When in doubt use parentheses. Just because you can omit them doesn’t mean you should omit them. If your program is running for more than a few seconds, give your users some feedback. If you’re programming a GUI in PerlTk, use a progress bar. If your program is a command line driven program then always program a -help parameter to give users some idea of what the program does and what to type. Make the invocation of the program with no parameters display some help information. Give a user the option to get more help with a –help parameter. Make error messages clear so a user knows what to fix when things don’t run the way they expect.

Use Common Sense - II  Add exit codes. Here’s how: use constant EXIT_OKAY => 0; # Success use constant EXIT_BAD_ARGS => 1; # Failed with bad arguments # Later in your program if ( $number_of_arguments < 4 ) # Not enough arguments given ! { exit ( EXIT_BAD_ARGS ); } # And at the end of your program exit( EXIT_OKAY );

 Always return values from subroutines with an explicit return statement.  Why?

 If you ever cut-and-past code, then that code belongs in a subroutine.  Why?

Since many programs are often chained together or are run within a single controlling program, make sure all scripts return an error or success code. Error codes for success are always 0 (zero). If programs are designed to be chained together in a shell script, then follow the Unix philosophy of having programs that complete successfully return no output at all (i.e., they are silent). Always return a value from both your program and any subroutines in that program. If you don’t use an explicit return statement then the value returned is the result of last statement evaluated. This will change as you modify your code, and in particular since most code is added at the end of a program, the return value from what you’re currently writing will be changing what is seen by whatever wrapper is running your code. If it’s vital that your code not return a value, because, say, you want to indicate that an error occurred but it wasn’t a fatal error, then return undef. In Perl undef is a value that represents not defined.

Common Traps    

Don’t confuse “==” (numeric comparison) for “=” (assignment). Don’t confuse “==” and “=~” Use eq for string comparison. Always use the standard header:

use warnings; use strict; use diagnostics;

although it’s sometimes useful to use -Mdiagnostics on the command line.  Arrays count from [0], not [1], so a 20 element array as elements [0] to [19].  Hashes have no order.  You can’t iterate over them with foreach.  You can’t index into them with [].  You can iterate over them with each(), keys() and values() [and sort() to impose order].

Object Oriented Filehandles  It would be nice to treat filehandles like lexical variables. Here’s how: use IO::Handle; use IO::File; MAIN: { my $logfile = “log.log”;

Create the filehandle < = read > = write >> = append

# Then later . . . my $log_fh = IO::File->new( "> $logfile" ) or die( "Couldn't open/append file $logfile" ); # Then later print $log_fh “This line’s heading for the logfile\n”; exit; }

Note: no comma

The code shows how we can create a lexical variable that is a filehandle. We can pass this to any subroutine at any stack depth and print information to it as shown. Note that as with normal filehandles, there is no comma between the filehandle and the thing that is being printed to it. Unlike normal filehandles this filehandle is a lexical variable. You can explicitly close the handle with close, or, you can just let the handle go out of scope at which point it will be automatically closed. In early versions of Perl (when machine speeds were 66MHz) there was a considerable time overhead in loading the vast amount of code that is hidden behind IO::Handle and IO::File. With modern machine speeds this is no longer an issue.

Add A Command Line To Your Program - I  There are two solutions:  Use the getopt module:  getopt::std  getopt::long

 Advantage - it’s written for you, you supply a template.  Disadvantage - if it doesn’t do what you want, you’re stuck with it.  Write your own:  Advantage - it will do exactly what you want.  Disadvantage - you have to write it, but,  We can use (reuse) a template.

Add A Command Line To Your Program - II  Here’s a template: sub Parse_Command_Line_Arguments( $ ) { my ( $arguments_r ) = @_; my $usage

while ( $numargs-- ) { $next_arg = shift( @$arguments_r ); SWITCH: { if ( $next_arg =~ m/^\-input/i ) { $input_filename = shift( @$arguments_r ); $numargs-- ; last SWITCH; } if ( $next_arg =~ m/^\-output/i ) { $output_filename = shift( @$arguments_r ); $numargs--; last SWITCH; } if ( $next_arg =~ m/^\-print_flag/i ) { $print_flag = TRUE; last SWITCH; } if ( $next_arg =~ m/^\-/i ) { croak( "Unknown command line switch $next_arg" ); }

= “my_prog -input -output [-print_flag]”;

my $numargs = @$arguments_r; my $argument = undef; foreach $argument ( @$arguments_r ) { if ( $argument =~ m/\-help/i ) { # Help requested exit 0; } } if ( $numargs < 1 ) # Process all arguments { print ( "\nUsage: $usage\n" ); print ( "\nUse my_prog -h to get more help\n\n" ); exit 0; } my my my my

$next_arg $input_filename $output_filename $print_flag

= = = =

undef; undef; undef; FALSE;

} } } return ( $input_filename , $output_filename , $print_flag ); }

All of this code is in the file perl.template in the release directory of this course.

Add A Command Line To Your Program - III  The code is then called like this: my ( $input_filename , $output_filename , $print_flag ) = Parse_Command_Line_Arguments( \@ARGV ); croak ( "Missing input filename" ) unless defined $input_filename; croak ( "Missing output filename" ) unless defined $output_filename;

All of this code is in the file perl.template in the release directory of this course.

Parsing Files - I  This is a standard Perl idiom for reading and re-writing a file: Use Netlist_Tools; MAIN: { my $file_r = Read_File( “BigFile.txt” ); foreach my $line ( @$file_r ) { # Fiddle with the line } Write_File( “NewFile.txt” , $file_r ); exit 0; }

 Note that each time through the foreach loop, $line is a reference, not a copy.

You want to load and loop through all the lines of a file performing some programming tasks on some or all of the lines. You then want to write out a new file containing whatever manipulations you’ve done. This code will load one of (in this order) BigFile.txt, BigFile.txt.gz, BigFile.txt.gzip. If you specify an output filename in Write_File that is suffixed in either .gz or .gzip then the file will be compressed (with gzip) before it is written. A major advantage of Read_File is that not only will it transparently read in the file via gzip if necessary, all the lines are then formatted so that every line is in a list that can be iterated, and every line is guaranteed to have no white-space before the first non-white-space character. There will also be no white-space at the end of the line and all “words” on a line will be separated by exactly one space. If you don’t want the formatting that Read_File imposes then use Read_File_Without_Formatting to get at the raw unaltered data. This will make the regular expressions that detect the information you’re interested in finding, more complicated, but, if the formatting is important, it will be preserved.

Parsing Files - II  This is a standard Perl idiom for reading, and then writing, a new file: use Netlist_Tools; MAIN: { my $in_file_r = Read_File( “BigFile.txt” ); my $out_file_r = []; foreach my $line ( @$file_r ) { # Inspect the line, generate new information from it. Write the # new information into a list like this: push @$out_file_r , “new stuff”; # Don’t add \n at the ends of lines } Write_File( “NewFile.txt” , $out_file_r ); exit 0; }

If, alternatively, you want to create a new file based on some or all of the contents of an input file, you can re-write the body of the code in the previous program like this: In this code we still read an input file, but, rather then altering the information in that file (and thus destroying the original) we make a new file on-the-fly and then write that to disk under a new name.

Interacting With A Compute Farm (The LSF Queue)  Scheduling jobs in an efficient way: # This example shows how to use ELDO for which we have 4 licenses. We’ll limit ourselves to use 2 of them. # While we’re limiting ourselves here because of scarce license resource, the same code can stop queues being # flooded with jobs that are pending but consuming queue slots (and making yourself pretty damn unpopular). my ( $running_jobs , $jobs_limit ) = ( 0 , 2 ); foreach my $file qw ( File_1.cir File_2.cir . . . File_98.cir File_99.cir ) { my $bjobs_output = `bjobs -q linux 2>&1`; my @bjobs_lines = split( /\n/ , $bjobs_output ); $running_jobs = scalar( @bjobs_lines ) - 1; if ( $running_jobs < $jobs_limit ) { my $command = "bsub -q linux \'eldo -nomail -queue -stver $file\'"; system( $command ); print "\nJob $command submitted\n"; sleep 10; } else { print "."; # Poor mans progress bar sleep 10; redo; }

Get queue status

Queue a job

Sleep

} exit;

The LSF queuing system allows CPU intensive jobs to use the shared CPU resource of most of the machines in this building. Here’s how to interface to that queuing system, while limiting yourself to a predetermined number of jobs and adding new jobs to the queue as old jobs complete:

Using Object Oriented (OO) Modules - I  Using OO modules involves:  Creating an object and setting up internal state.  Calling methods to manipulate the object.

 Objects can be: Graphic objects. File objects. HTML objects. Database objects. . . .  The list is literally endless.    

 Programmers fall into two distinct groups:  User of OO modules.  Writers of OO modules.

For OO modules you really do need to read the documentation and look at examples. OO programming is quite different from declarative programming. In OO you create objects and then rather than call functions and procedures to manipulate (potentially shared) data, you have objects send messages to each other to achieve the same thing. If you’ve never done this before it can all seem a little weird.

Using Object Oriented (OO) Modules - II  Let’s look at an example of using colours, palettes and bitmaps: #!/usr/local/bin/perl use Bitmap; use Palette; use Colour; use constant BMP_HEIGHT => 100; MAIN: { my $palette_r = Palette->new( "Palette_1024.pal" ); my $colours_in_palette = $palette_r->get_palette_colour_count(); my $bitmap_r = Bitmap->new( $colours_in_palette , BMP_HEIGHT ); foreach my $x ( 0 .. ( $colours_in_palette - 1 ) ) { my $colour_r = $palette_r->get_indexed_colour( $x ); $bitmap_r->vline( 0 , ( BMP_HEIGHT - 1 ) , $x , $colour_r ); } $bitmap_r->save( "Scale.bmp" );

Start

Do work

End

exit; }

$palette_r is a palette object. $bitmap_r is a bitmap object. You make objects do useful things by executing methods (subroutine calls), or in OO parlance, you send them messages asking/telling them what to do. So, we create a new colour palette by sending Palette the new message with the name of a file that contains a description of our colour palette. In return we get a palette object that we store in a variable called $palette_r. We can count how many colours are in the palette we’ve just created by sending the newly created $palette_r object the get_palette_colour_count() message - this returns a number, the number of colours in the palette. We next create a bitmap object by sending a Bitmap the new message. The two parameters that new() requires are the X and Y dimensions of the bitmap.

Using Object Oriented (OO) Modules - III  Here are the results (this is Scale.bmp):

 Note how the code to generate this was:  Brief and concise.  Reasonably abstract. You didn’t need to know about colours or RGB-triples. Bitmaps or their internal representations.  How to perform various geometrical algorithms like drawing lines.  

 These are the messages a bitmap object can respond to:  new(), get_bitmap_x(), get_bitmap_y(), get_bitmap_xy_r(), get_cliprect_x(), get_cliprect_y(), get_invert_y(), set_cliprect_x(), set_cliprect_y(), set_invert_y(), set_pixel(), hline(), vline() , line(), circle(), bitmap_sign(), rect(), filled_rect(), save(), resize().

 Have you noticed that we’re not talking in Perl any more!

Using The Debugger

This file exists as a stand-alone .PDF file in the release area for this course.

Using Regular Expressions

This file exists as a stand-alone .PDF file in the release area for this course.

Perl For Beginners - September 2005 - Course Feedback Form

Excellent

Good

Poor

Too long

Just right

Too short

Too detailed

Just right

Not detailed enough

Too detailed

Just right

Not detailed enough

Too many lectures

Just right

Too many labs

Was the course content

Was the course duration

Were the course notes

Were the labs

Was the balance between lecture material and labs Yes

No

Yes

No

Yes

No

Was any material covered that you thought should have been omitted? If so, what?

Was any material omitted that you thought should have been covered? If so, what? Would you be interested in an advanced course covering object oriented Perl? Any other comments

Your name (optional)

Related Documents


More Documents from "John"