Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Introduction to programming in Perl Nicodème Paul
[email protected]
http://www2.biozentrum.unibas.ch/personal/schwede/Teaching/BixI-WS0607/frame.htm
01-11-06
1
What is programming ? Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Programming is breaking a task into small steps (divide and conquer). Sum : 15 + 25 + 11 ?
15 + 25 + 11 40
+ 11
51
Programs are written in a programming language such as : Fortran, Pascal, C, C++, java, Perl, Python, ….
01-11-06
2
Program translator Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Computer
Processor Program
Compiler or Interpreter
0101011
Memory
01-11-06
3
What is Perl ? Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Perl : Practical Extraction and Report Language by Larry Wall (1987) • Text-processing language • Glue language • Very high level language • perl is the language compiler/interpreter program
01-11-06
4
Why do we use Perl? Introduction to programming in Perl
WS 2006/07: Bioinformatics I
• Simplicity • Rapid prototyping • Portability • Widely used in Bioinformatics
01-11-06
5
A first example Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl
# shebang line
# Pragmas use strict; use warnings;
# Restrict unsafe constructs # Provide helpful diagnostics
# Assign 15 to $number1 my $number1 = 15; # Assign 25 to $number2 my $number2 = 25; # Assign 11 to $number3 my $number3 = 11;
$number1 = $number1 + $number2; $number1 = $number1 + $number3; print “My result is : $number1\n”;
01-11-06
# $number1 contains 40 # $number1 contains 51 # Print the result on the terminal
6
Scalar Data Type Introduction to programming in Perl
• • • • • •
WS 2006/07: Bioinformatics I
$answer = 36; $pi = 3141659265 $avocados = 6.02e23; $language = “Perl”; $sign1 = “$language is nice”; $sign2 = ‘$language is nice’;
# an integer # a real number # scientific notation # a string # string with interpolation # string without interpolation
Scalar = singular variable $
S
01-11-06
7
Scalar Binary Operators Introduction to programming in Perl
$u = 17
$v = 3
WS 2006/07: Bioinformatics I
$s = “Perl”
Name
Example
Result
Addition
$u + $v
17 + 3 = 20
Subtraction
$u - $v
17 – 3 = 14
Multiplication
$u * $v
17 * 3 = 51
Division
$u / $v
17 / 3 = 5.66666666667
Modulus
$u % $v
17 % 3 = 2
Exponentiation
$u ** $v
17 ** 3 = 4913
Concatenation
$s . $s
“Perl” . “Perl” = “PerlPerl”
Repetition
$s x n
“Perl” x 3 = “PerlPerlPerl”
01-11-06
8
Scalar Unary Operators Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Numbers
Strings
abs(expr)
uc(expr)
sqrt(expr)
lc(expr)
exit(expr)
chop(variable)
exp(expr)
chomp(variable)
int(expr)
reverse(expr)
log(expr)
length(expr)
¾ perldoc –f function_name 01-11-06
9
Context Introduction to programming in Perl
WS 2006/07: Bioinformatics I
$u = “12” + 5; ¾17 $u = “12john” +5; ¾17 $u = “john12” + 5; ¾5
use strict; $u = “john12” + 5; ¾ Argument “john12” isn’t numeric in addition (+) at line 3 ¾5 $u = “12” + 5; ¾17 01-11-06
10
Array data type Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Values
Indices
0
35
1
12.4
2
“bye\n”
3
1.7e23
4
‘Hi’
$data[0] = 35; $data[1] = 12.4; $data[2] = “bye\n”; $data[3] = 1.7e23; $data[4] = ‘Hi’; @data = (35, 12.4, “bye\n”, 1.7e23, ‘Hi’)
Array = plural variable @
01-11-06
a
11
Array operators Introduction to programming in Perl
WS 2006/07: Bioinformatics I
@let = (“J”, “P”, “S”, “D”, “C”);
pop
$r=pop(@let)
$r=“C” @let=(“J”,“P”,“S”,“D”)
push
push(@let,“G”)
@let=(“J”,“P”,“S”,“D”,”C”,“G”)
shift
$r=shift(@let)
$r=“J” @let=(“P”,“S”,“D”,“C”)
unshift
unshift(@let,”G”)
@let=(“G”,“j”,“P”,“S”,“D”,“C”)
splice
@a=splice(@let,1,2)
@a=(“P”,”S”) @let=(“J”,”D”,”C”)
join
$r=join(‘:’,@let)
$r=“J:P:S:D:C”
scalar
$r=scalar(@let)
$r=5
reverse @a=reverse(@let) 01-11-06
@a=(“C”,”D”,”S”,”P”,”J”)
¾ perldoc –f function_name
12
Search for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”); my $offset = int(rand(scalar(@names))); # random index in [0, …, 4], int(2.55) = 2 if ($names[$offset] eq “Simon”) { # block start for the if statement print “Simon is found\n”; print “Success!\n”; } # block end for the if statement else { # block start for the else statement print “Simon is not found\n”; print “Failed!\n”; } # block end for the else statement
01-11-06
13
Comparison operators Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Comparison
Numeric
String
Return Value
Equal
==
eq
1 if $a is equal to $b , otherwise “”
Not equal
!=
ne
1 if $a is not equal to $b , otherwise “”
Less than
<
lt
1 if $a is less than $b , otherwise “”
Greater than
>
gt
1 if $a is greater than $b , otherwise “”
Less than or equal
<=
le
1 if $a is not greater than $b , otherwise “”
Greater than or equal
>=
ge
1 if $a is not less than $b , otherwise “”
Comparison
<=>
cmp
0 if $a and $b are equal, 1 if $a is greater, -1 if $b is greater
“” is the empty string
01-11-06
14
What is true or false? Introduction to programming in Perl
WS 2006/07: Bioinformatics I
•
Any number is true except for 0.
•
Any string is true except for “” and “0”.
•
Anything else converted to a true value string or a true value number is true.
•
Anything that is not true is false.
01-11-06
15
Logical operators Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Example
Name
Result
$a && $b
AND
$a if $a is false, $b otherwise
$a || $b
OR
$a if $a is true, $b otherwise
! $a
NOT
True if $a is not true, false otherwise
$a and $b
AND
$a if $a is false, $b otherwise
$a or $b
OR
$a if $a is true, $b otherwise
not $a
NOT
True if $a is not true, false otherwise
$a xor $b
XOR
True if $a or $b is true, false if both are true
Pay attention to precedence rule : $xyz = $x || $y || $z
is not the same as
$xyz = $x or $y or $y
! Use parentheses !
01-11-06
16
Conditional statements Introduction to programming in Perl
WS 2006/07: Bioinformatics I
• Simple Statement if (Expression); • Compound if (Expression) Block if (Expression) Block else Block if (Expression) Block elsif (Expression) Block else Block
01-11-06
17
Search for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”); my $offset = int(rand(scalar(@names))); my $count = 1; while($names[$offset] ne “Simon”) { # block start for the while statement $offset = int(rand(scalar(@names))); $count = $count + 1; } # block end for the while statement print “Simon is found after $count trials\n”;
01-11-06
18
Check for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”); for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop if ($names[$i] eq “Simon”) { print “Simon is found\n”; } } # end block for the for loop
01-11-06
19
Check for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”); for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop if ($names[$i] eq “Simon”) { print “Simon is found\n”; last; # jump outside of the loop } } # end block for the for loop
01-11-06
20
Loop statements Introduction to programming in Perl
•
WS 2006/07: Bioinformatics I
Simple Statement while (Expression);
•
Compound while (Expression) Block for (Initialization; Expression; Incrementing) Block
01-11-06
21
Hashes Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Values
Keys
John
5
Peter
3
Simon
11
Dave
1
Chris
4 %names
$names{“John”} = 5; $names{“Peter”} = 3; $name{“Simon”} = 11 $names{“Dave”} = 1 $names{“Chris”} = 4
01-11-06
%
Key/value
22
Check for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my %names = (“John”, 5, “Peter”, 3, “Simon”, 11, “Dave”, 1, “Chris”, 4); my $key = “Simon”; if (exists $names{$key}) { exists return true if the key is in %names otherwise false print “$key is found, his value is : $names{$key}\n”; } else { print “$key is not found\n”; }
01-11-06
23
Check for a name Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my %names = ( “John” => 5, “Peter” => 3, “Simon” => 11, “Dave” => 0, “Chris” => 4 ); my $key = “Simon”; if (exists $names{$key}) { print “$key is found, his value is : $names{$key}\n”; } else { print “$key is not found\n”; } 01-11-06
24
Hash operators Introduction to programming in Perl
WS 2006/07: Bioinformatics I
exists
exists $hash{$key}
Returns true if $key is in %hash, otherwise it returns false
delete
delete $hash{$key}
Deletes $key => $hash{$key} from %hash.
each
each %hash
Steps through a hash one key/value pair at a time
keys
keys %hash
Returns a list consisting of all the keys of %hash
values
Values %hash
Returns a list consisting of all the keys of %hash
¾ perldoc –f function_name 01-11-06
25
Getting user input Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my $line; print “Type something : “; while ($line = <STDIN>) { # STDIN : Standard Input if ($line eq “\n”) { print “That was just a blank line\n”; } else { print “Input : $line”; } print “Type something : “; }
¾ Ctr-C to exit 01-11-06
26
Reading from a file Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; print “Enter the filename: “; my $filename = <STDIN>; # Read Standard Input for a # filename chomp($filename); # Remove the end of line character if (! (-e $filename)) { # Test whether the file exists print “File not found\n”; exit 1; }
Input file John Peter Simon Dave Chris
open(IN, $filename) || die “Could not open $filename\n”; my @names =
; # Store the content of the file in an array close(IN); print @names;
01-11-06
27
Reading from a file Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; print “Enter the input file name : “; my $filename = <STDIN>; # Read Standard Input for a filename chomp($filename); # Remove the end of line character if (! (-e $filename)) { print “File not found\n”; exit 1; } my %names = (); my ($key, $value); open(IN, $filename) || die “Could not open $filename\n”; while ($line = ) { chomp($line); ($key, $values) = split(‘\t’, $line); $name{$key} = $value; } close(IN); $, = “ “; # It contains the separator for the print statement print %names, “\n”; 01-11-06
Input file : data.txt John Peter Simon Dave Chris
5 3 11 1 4
28
Input and output functions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
open
open FILEHANDLE, EXPR
open a file to referred using FILEHANDLE
close
Close FILEHANDLE
Close the file associated with FILEHANDLE
print
print [FILEHANDLE] LIST
Print each element of LIST to FILEHANDLE
¾ perldoc –f function_name
01-11-06
29
Testing files Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Example
Name
Result
-e $filename
Exists
True if file named in $a exists, otherwise false
-r $filename
Readable
True if file named in $a is readable, otherwise false
-w $filename
Writable
True if file named in $a is writable, otherwise false
-d $filename
Directory
True if file named in $a is a directory, otherwise false
-f $filename
File
True if file named in $a is a regular file, otherwise false
-T $filename
Text File
True if file named in $a is a text file, otherwise false
01-11-06
30
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; My $filename = “data.txt”; my $line; my %data = (); my $key;
Input file : data.txt
open(IN, $filename) || die “Could not open $filename\n”; while ($line = ) { chomp($line); if ($line =~ /^>/) { # check for ids using pattern matching $key = $line; } else { data{$key} = $line; } } close(IN); my @ids = keys %data; my @sequences = values %data; $, = “ “; print @ids, “\n”, @sequences, “\n”; 01-11-06
>id1 ATTGTC >id2 GGTCCT >id3 TATGAAA >id4 GTGTATA
31
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
EXPR =~ m/PATTERN/ m// Operator (Matching): searches the string in the scalar EXPR (or $_) for PATTERN; in scalar context the operator returns true (1) if successful, false (””) otherwise; in list context m// returns a list of substrings matched by any capturing parentheses in PATTERN; PATTERN undergoes double-quote interpolation. $line = “>id1” => $line =~ /^>/
VAR =~ s/PATTERN/REPLACEMENT/ s/// Operator (Substitution): searches the string in scalar variable VAR (or $_) for PATTERN and, if found, replaces the matched substring with the REPLACEMENT text; in scalar and list context s// returns the number of times it succeeded; both PATTERN and REPLACEMENT undergo double-quote interpolation. $line = “>id1” => $line =~ s/>//
VAR =~ tr/SEARCHLIST/REPLACEMENTLIST/ tr/// Operator (Transliteration): scans the string in scalar variable VAR (or $_) , character by character, and replaces each occurrence of a character found in SEARCHLIST with the corresponding character in REPLACEMENT list; in scalar and list context tr// returns the number of characters replaced or deleted; SEARCHLIST is NOT a regular expression and both SEARCHLIST and REPLACEMENT list do not undergo full double-quote interpolation (backslash sequences but no variable interpolation). $line = “id1” => $line =~ tr/a-z/A-Z/ 01-11-06
32
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings; my $filename = “data.txt”; my $line; my %data = (); my $key;
Input file : data.txt
open(IN, $filename) || die “Could not open $filename\n”; while ($line = ) { chomp($line); if ($line =~ /^>/) { #check for ids using pattern matching $line =~ s/>//; #substitute > by nothing in id $line =~ tr/a-z/A-Z/; #translate lower case to upper case $key = $line; } else { data{$key} = $line; } } close(IN); my @ids = keys %data; my @sequences = values %data; $, = “ “; print @ids, “\n”, @sequences, “\n”;
>id1 ATTGTC >id2 GGTCCT >id3 TATGAAA >id4 GTGTATA
01-11-06
33
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Symbol
Meaning
\...
Used to escape metacharacters (including itself) or to make the next character a metacharacter (like \s, \w, \n)
...|... Alternation (match one or the other) (...)
Grouping (treat as a unit)
[...]
Character class (match one character from a set)
^
True at the beginning of string (or sometimes after any newline)
$
True at the end of the string (or sometimes before any newline)
.
Match any one character (except newline, normally)
$seq =~ /AAA$/
01-11-06
34
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Quantifier
Meaning
*
Match 0 or more times (maximal)
+
Match 1 or more times (maximal)
?
Match 0 or 1 time (maximal)
{COUNT}
Match exactly COUNT times
{MIN,}
Match at least MIN times (maximal)
{MIN,MAX}
Match at least MIN times but not more than MAX times (maximal)
*?
Match 0 or more times (minimal)
+?
Match 1 or more times (minimal)
??
Match 0 or 1 time (minimal)
{MIN,}?
Match at least MIN times (minimal)
{MIN,MAX}?
Match at least MIN times but not more than MAX times (minimal)
$seq=“TATGAAA”
$seq =~ /.*AAA$/
$seq =~ /.*A{3}$/
01-11-06
35
Regular expressions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
Symbol
Meaning
Character Class
\d
Digit
[0-9]
\D
Nondigit
[^0-9]
\s
Whitespace
[ \t\n\r\f]
\S
Nonwhitespace
[^ \t\n\r\f]
\w
Word character
[a-zA-Z0-9_]
\W
Non-(word character)
[^a-zA-Z0-9_]
$id = “id2” 01-11-06
$id =~ /id\d+$/ 36
Subroutines or functions Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl use strict; use warnings;
Input file : data1.txt
my $filename1 = “data1.txt”; my $filename2 = “data2.txt”; my %data1 = get_data($filename1); #subroutine call my %data2 = get_data($filename2); #subroutine call $, = “ “; print keys %data1, “\n”, values %data2, “\n”; print keys %data1, “\n”, values %data2, “\n”; sub get_data { my $filename = shift(@_); my $key; my %tmp = (); open(IN, $filename) || die “Could not open $filename\n”; while (my $line = ) { chomp($line); if ($line =~ /^>/) { $line =~ s/>//; $line =~ tr/a-z/A-Z/; $key = $line; } else { $tmp{$key} = $line; } } close(IN); return %tmp; }
>id1 ATTGTC >id2 GGTCCT >id3 TATGAAA >id4 GTGTATA
Input file : data2.txt >id5 ATAAAAA >id6 GGAATTT >id7 TATGATT >id8 GTGTAAT
01-11-06
37
Packages Introduction to programming in Perl
WS 2006/07: Bioinformatics I
#!/usr/bin/perl
package MyTools;
use strict; use warnings; use MyTools;
sub get_data { my $filename = shift(@_); my $key; my %tmp = (); open(IN, $filename) || die “Could not open $filename\n”; while (my $line = ) { chomp($line); if ($line =~ /^>/) { $line =~ s/>//; $key = $line; } else { $tmp{$key} = $line; } } close(IN); return %tmp; } 1; # this should be your last line
my $filename1 = “data1.txt”; my $filename2 = “data2.txt”; my %data1 = MyTools::get_data($filename1); my %data2 = MyTools::get_data($filename2); $, = “ “; # set the print separator print keys %data1, “\n”, values %data1, “\n”; print keys %data2, “\n”, values %data2, “\n”;
Comprehensive Perl Archive Network (CPAN) http://www.cpan.org/ 01-11-06
38
References Introduction to programming in Perl
•
WS 2006/07: Bioinformatics I
Recommended Books – Beginner » “Learning Perl”, 4th Edition by Randal Schwartz, Tom Phoenix & Brian D Foy
» “Beginning Perl for Bioinformatics”, 1st Edition by James Tisdall
» Edition by Cynthia Gibas & PerJambeck » “Developing Bioinformatics Computer Skills”, 1st 01-11-06
39