Perl syntax grab bag
- Whitespace is unimportant. You’re free to format your programs however you wish.
- Everything from a
#
to the end of the line is a comment. (You can’t use C/C++ style comment syntax.) - Operators in Perl work very similar to their counterparts in C/C++. Here’s a full list of Perl operators. We’ll discuss a number of Perl’s less familiar operators and operator behavior below.
- Boolean operators like
&&
and||
return the last value evaluated, not true or false; you’ll often see code like$foo = $bar || $baz;
, which will set$foo
to the value of$bar
if it’s non-false, and the value of$baz
otherwise. for
andforeach
are synonyms. You can use C-style syntax infor
loops (e.g.for (my $i = 0; $i < 10; $i++) { print $i; }
), but you'll hardly ever see it.- Perl's keyword for "else if" is
elsif
(spelled just like that).
All strings being equal
Perl has two equality operators, ==
and eq
. Use ==
to compare numbers, and eq
to compare strings.
To be more specific, ==
checks to see if its operands are numerically equal (i.e., if they evaluate to the same numerical value), and eq
checks to see if they are stringwise equal (i.e., if they evaluate to the same string). Perl will do its best to figure out what numerical value is in a string, but a string with no numerical content will always evaluate (numerically) to zero. Importantly, this means that ==
will evaluate to true for strings in unexpected situations. The following example code, for example, will print numerically equal
(but not, of course, stringwise equal
):
$foo = "blah"; $bar = "ducks"; print "numerically equal\n" if ($foo == $bar); print "stringwise equal\n" if ($foo eq $bar);
Along with eq
come a whole host of operators that compare variables based on their stringwise value. Most important is ne
, the stringwise equivalent of !=
, but there's also lt
(stringwise less than), gt
(stringwise greater than), le
(stringwise less-than-or-equal-to), etc. You can use these operators to compare and sort strings according to their alphabetical order.
Quote me on that
One of Perl's strengths is that it has a multitude of quoting operators. Double-quote and single-quote work as they do in PHP (the former allows for variable interpolation and the latter doesn't), but Perl supports the following operators in addition:
- The double-quote operator
qq{string}
is the equivalent of"string"
- The single-quote operator
q{string}
is the equivalent of'string'
- The "quote words" operator
qw{foo bar baz}
is equivalent to writing("foo", "bar", "baz")
(this is great for creating arrays of words)
In each of the special quote operators above, the delimiter characters {
and }
can be replaced with any other matching pair of characters (say, (
and )
) or any other single character. The two following statements, for example, both print the string foo
:
print qq/foo/; print q#foo#;
(This is handy for quoting strings with characters you would otherwise have to escape in a single- or double-quoted string.)
Hashes; Capturing regex matches
The following program, you_capture.pl
, finds all occurrences of the word "you" and whatever word follows; it stores a count of how many times each following word occurs. In order to do this, we introduce two new features of Perl: capturing and hashes. Let's look at the source code:
#!/usr/bin/perl use strict; my %words = (); while (my $line = <>) { chomp($line); if ($line =~ /[Yy]ou (\w+)/) { $words{$1}++; } } foreach my $key (keys %words) { print "$key: $words{$key}\n"; }
Here's how it works. First off, we declare a hash and assign an empty value to it. Hashes are the Perl equivalent of PHP's associative arrays, or Python's dictionaries, or C++'s STL map. They store associations between keys and values.
The regular expression on line 8 should look comprehensible. The new syntax here is the parentheses, which tell Perl to store (or "capture") the part of the source string that matched that portion of the regular expression. You might render the regular expression in line 8 in English as "if the line matches the word 'you,' beginning with either an upper- or lowercase 'y', then match a space after that, and then remember the sequence of one or more alphanumeric characters that follow."
When you use regular expressions with parentheses, the binding operator =~
not only returns true if the expression matched, but also makes available the variables $1
, $2
, $3
and upwards, each of which contains the string captured in the corresponding set of parentheses in the regular expression. (e.g., $1
contains the string from the first match, $2
contains the string from the second match, etc.)
Working with hashes
You can access or modify values in a hash with the following syntax:
$x{"y"}
... where x
is the variable name of your hash and y
is the key whose value you wish to access or modify. In you_capture.pl
above, we're using the matched word ($1
on line 9) as the key, and incrementing the key's value each time it's matched. (In Perl, using the value of a key that hasn't yet been defined isn't an error; Perl simply initialized the value to zero before you use it. This is called autovivification, and it's a unique feature of Perl.)
There are several strategies for looping over each key/value pair in a hash. The final foreach
in you_capture.pl
demonstrates one such strategy: the keys
built-in function returns an array of all of the keys in a hash, which the foreach
then loops over.
Further notes about hashes:
- PHP remembers in what order you put keys into an associative array, and then returns those keys in the same order (when you, e.g., loop over the array). Perl does not remember the order of keys; you'll get the keys back in an arbitrary order.
Substitutions and hash literals
Next up: hash literals and substitutions. The following program replaces all instances of the patterns in the keys of %patterns
with the string in the corresponding values:
#!/usr/bin/perl use strict; my %patterns = ( '\bwoods\b' => 'ducks', 'ee' => 'eeeeeee', '\.$' => '. Oh yeah, baby!' ); while (my $line = <>) { chomp $line; while (my ($pat, $repl) = each %patterns) { $line =~ s/$pat/$repl/g; } print "$line\n"; }
Notes:
- Hash literals look a lot like array literals, with one key difference: every other entry in the list becomes a key, with the next entry as the corresponding value. (The
=>
operator is synonymous with,
.) - Line 12 illustrates another way to loop over the key/value pairs in a hash. The
each
function returns a key/value pair from the given hash until all key/value pairs have been exhausted (in which case it returns false). each
returns a list: you can assign the result of such functions to more than variable using list syntax. (e.g.my ($first, $second, $third) = returns_list()
will create variables$first
,$second
, and$third
in lexical scope with the corresponding values from the return value ofreturns_list()
).- The
s/x/y/
operator replaces occurrences of the patternx
with the string iny
. The finalg
means "global," i.e., for every instance on the given line; without the 'g', only the first occurrence on each line would be replaced
Putting it all together: parsing a configuration file
Let's say we had a configuration file in the following format, and we wanted to get some information out of it. Let's write a Perl program to do just that.
# this is a test configuration file! # we're going to write a perl script to parse it # format: # # [section header] # key=value # # lines beginning with # are comments, and won't be interpreted. # anything from # to the end of the line will be ignored. # the '=' sign between key/value pairs can have leading/trailing whitespace. [widget] blinkiness=7 num_buttons=6 brand_name = Jim's Widget and Baking Supply Co. [zookeeper] is_surly=true raise=$1000 # our zookeeper has been very thorough! ducks_owned=438 favorite_cheese=gouda [classic_movie] this=is a test hello=there
What we want is a data structure that will let us get to these configuration values in a hierarchical fashion. Here's parse_config.pl
, designed to do just such a job. At the end of the first while
, the %config
hash will contain a key for each section in the config file, whose value is another hash, which has a key/value pair for each key/value pair in the configuration file.
#!/usr/bin/perl use strict; my %config = (); my $current_section = "undefined"; while (my $line = <>) { next if $line =~ /^#/; # next line if this is a comment chomp($line); $line =~ s/#.+$//; # remove commends from the end of the line if ($line =~ /^\[(\w+)\]$/) { $current_section = $1; } if ($line =~ /^(\w+)\s*=\s*(.*)$/) { # creates/updates a reference to a hash within a hash $config{$current_section}{$1} = $2; } } while (my ($section, $pairs) = each %config) { print "Section: $section\n"; # "%$pairs" below because $pairs is a reference to a hash while (my ($key, $val) = each %$pairs) { print " $key -> $val\n"; } }
Notes:
expr1 if expr2
is short forif (expr2) { expr1 }
next
is Perl-speak forcontinue
- Note on line 16, capturing two things from one regex
- On line 18, we're assigning to a hash inside another hash. There's a trick here, though---remember, hashes can only contain scalar values. What's actually happening here is that the value for this key is a hash reference. More on this later, but it's also the reason that we need the extra
%
in front of the hash variable on line 25.
Mini-exercise. Make a version of
replace.pl
that uses a configuration file to specify patterns and replacements. You may need to investigate Perl's functions for opening and reading from file handles other than standard input (seeopen
).
Laconic Perl
Perl has a reputation for being a very terse language, with the ability to write powerful code that doesn't use a lot of characters. There are a number of tricks that make this possible. Let's learn about some of them.
The default variable
The first is the use of $_
, the so-called "default variable," which gets set inside of looping constructs like foreach
. The following code, for example, will print the digits one through nine, each on a new line:
my @numbers = (1, 2, 3, 4, 5, 6, 7, 8, 9); foreach (@numbers) { # note missing loop variable! print "$_\n"; }
Certain built-in Perl functions will use $_
as a default argument if no other argument is present. Among these is print
; the above code could be re-written as:
my @numbers = (1, 2, 3, 4, 5, 6, 7, 8, 9); foreach (@numbers) { # note missing loop variable! print; # no argument specified, so uses $_ print "\n"; }
Statement modifiers
Certain conditionals and loops can be written in Perl as statement modifiers. We saw such a modifier in parse_config.pl
(the next if ...
line). Perl foreach
and while
loops support a similar syntax. In the statement to the left of the modifier, $_
contains the value of the current iteration of the loop. The brief program above could be rewritten like so:
my @numbers = (1, 2, 3, 4, 5, 6, 7, 8, 9); print "$_\n" foreach @numbers;
When a line is read from standard input inside the condition of a while
loop, the line is automatically assigned to $_
inside the loop. You can take advantage of this fact to write incredibly terse programs. For example, the following program could be a simple clone of the cat
UNIX utility in Perl:
#!/usr/bin/perl print $_ while <>; # or even shorter: # print while <>;
Grep short
Let's take a look at grep_short.pl
, which does the same thing as grep.pl
, but takes advantage of some of these tricks:
#!/usr/bin/perl use strict; while (<>) { print if /\b[Ee]\w{4}\b/; }
Further reading
Reply
You must be logged in to post a comment.
No comments
Comments feed for this article
Trackback link: http://www.decontextualize.com/teaching/ppp/perl-syntax-hashes-and-style/trackback/