Basic Perl Usage

From SHellium Wiki
Jump to: navigation, search
Geographylogo.png In other languages: English | Afrikaans | Albanian | Arabic | Brazilian | Bulgarian | Catalan | Chinese | Croatian | Czech | Danish | Dutch | Esperanto | Estonian | Filipino | Finnish | Flemish | French | German | Greek | Hebrew | Hindi | Hungarian | Indonesian | Italian | Japanese | Latvian | Lithuanian | Macedonian | Malay | Malayalam | Norwegian (Bokmål) | Norwegian (Nynorsk) | Persian | Polish | Portuguese | Romanian | Russian | Serbian | Slovak | Slovenian | Spanish | Swedish | Turkish | Ukrainian | Urdu

Basic Perl Usage

The Perl language interpreter is a common component of the Linux operating system. Perl makes it possible to construct working programs very quickly, making it a good tool for prototyping and exploratory programming. The most confusing aspect of Perl is learning its syntax, which is more flexible and permissive than you might expect from a typical block-structured language like C or Java. Perl shares many features from other languages, making it fairly easy to learn for anyone with a basic knowledge of programming.

Perl has 3 basic data types: scalar, list, and associative array (i.e. hash table). Scalars store basic data units such as numbers, characters and strings. Lists are simply sets of scalar variables. Associative arrays are special lists containing name-value pairs, making them good for implementing look-up tables and data mappings. Perl is known as a dynamically-typed language because different data types may be stored in scalars. For example, the following snippet is correct Perl code...

$max = 24;     # integer data
$max = 24.0;   # floating point
$max = 'min';  # character string
$max = undef;  # the "undefined" value

The above code assigns the variable $max three different scalar data types. Scalar variables are prefixed with '$'. Before a variable is initialized its value is set to the special "undefined" value, called undef. By explicitly assigning a variable to undef, whatever value it previously held is deallocated (removed).

The assignment operator '=' is also used for initializing lists. Scalar data types may be mixed in a list. For example, a list might be initialised with a student's name followed by a series of floating point numbers indicating his exam results. Note that list names are prefixed with a '@'.

@results = ('Ron McStudent', 65.34, 82.0, 59.21, 70.5);

Lists are not constrained in size and may grow or shrink as needed. A list may be treated as a stack (LIFO) using the push and pop functions. push and pop respectively add and removed data from the end of a list. For example, Ron completes a new exam --- his mark is added to the @results list using push.

push(@results, 67.33);

The unshift and shift functions are similar to push and pop, but they operate on the front of a list not the end. unshift inserts a value at the front of a list, effectively moving all existing values right one position. shift removes the first list element and returns its value. A queue data structure (FIFO) may be implemented by combining the push and shift functions.

Lists may also be treated as arrays. When specifying an array element, the list name is prefixed by '$' and followed by an array subscript. Array subscripts use square brackets '[]'. Arrays count from zero, so the first value stored in the @results list is called $results[0]. For example, assume the @results list is initialized as shown above, and the student's name was typed incorrectly. The following code would re-assign the first element of @results.

$results[0] = 'Ren McPrudent';

If the minimum pass mark for an exam is 70%, each exam result can be marked as a PASS or FAIL. A condition for passing the course might be that there are more exam passes than fails. To compute this, the number of exam passes and fails is compared. A loop is convenient here for processing the @results list. It is possible to loop over each list element using the foreach statement. This construct assigns each list element to a temporary variable, terminating the loop at the end of the list.

Before we begin processing the numeric data stored in @results, the student name at the front of the list is removed using the shift function. This ensures that we're dealing with a pure numeric list. A procedure for computing the final PASS or FAIL mark might look like this...

@results = ('Ron McStudent', 65.34, 82.0, 59.21, 70.5, 67.33, 93.2);
$student = shift(@results);      # remove student name from list
$passes = 0;                     # number of exams passed
$fails = 0;                      # number of exams failed
 
foreach $mark (@results) {       # sequentially store exam results in $mark variable
    if ($mark > 70.0) {          # compare current mark to minimum exam pass (70%)
        $passes++;               # increment number of $passes and $fails
    } else {
        $fails++;
    }
}
 
if ($passes > $fails) {          # compare the number of $passes and $fails
    print($student . " PASS\n"); # print final PASS or FAIL for the course 
} else {
    print($student . " FAIL\n");
}

The most basic output statement in Perl is the print function. By default, output is printed on the standard output. The dot '.' operator is used for joining strings together. In the print statements above, the scalar $student containing the student's name is joined to a PASS or FAIL mark, followed by a newline '\n'. Scalars may also be included as part of a string, achieving a similar result to using the dot operator. To demonstrate this, the first print statement above may be rewritten as shown below.

print("$student PASS\n"); # will print 'Ron McStudent PASS'

When including scalars inside strings, make sure to use double-quoted strings. Single quotes cause Perl to interpret the string literally. Also, backslash-escaped characters such as the newline '\n' are not interpreted in single quoted strings.

print('$student PASS\n'); # INCORRECT: will print '$student PASS\n'

The increment operator '++' is shorthand for adding one to a scalar variable. In the above code, $passes++ is equivalent to $passes += 1, which is equivalent to $passes = $passes + 1. The decrement operator '--' performs a similar function, subtracting a scalar by one.

The conditional 'if' statement evaluates an expression as being true or false. In the two 'if' statements above, the greater-than operator '>' compares two numeric values. If the first operand is the greater of the two, the expression evaluates as true and the code directly after the condition is executed, otherwise the expression is false and the code following the 'else' token is executed.

The built-in list variable @ARGV is used for accessing command-line arguments. Command-line arguments are a simple but powerful input mechanism which programs may use. Slight modification to the code snippet above will transform it into a complete Perl program which reads its input from the command-line.

#!/usr/bin/perl
 
# results list now read from @ARGV
$student = shift(@ARGV);
$passes = 0;
$fails = 0;                      
 
foreach $mark (@ARGV) {          
    if ($mark > 70.0) {          
        $passes++;               
    } else {
        $fails++;
    }
}
 
if ($passes > $fails) {          
    print($student . " PASS\n"); 
} else {
    print($student . " FAIL\n");
}

Line one of the above code instructs the operating system to run the program through the Perl interpreter. '/usr/bin/perl' is the default path for Perl on Linux systems. In order to run your program, you need to enable execute permissions on the source file, e.g. using the chmod command as below.

$ chmod 755 yourfile.pl

Once the appropriate file permissions are set you can run your program from the Linux shell, entering student results information on the command-line, for example:

$ ./yourfile.pl 'Ron McStudent' 65.34 82.0 59.21 70.5 67.33 93.2

Or, alternatively...

$ perl yourfile.pl 'Ron McStudent' 65.34 82.0 59.21 70.5 67.33 93.2

This concludes the tutorial on basic Perl usage. I hope this limited introduction will give you a start in the right direction. The reference documentation distributed with Perl provides detailed information on the complete Perl syntax and built-in functions. The Perl documentation is found at the perl.org website [1]. Have fun!

--mwb

Debugger

At a command line, type the following...

% perl -de 0

The 0 is an arbitrary string, and fairly meaningless. What you've done is entered perl's interactive debugger. This is a really handy way to learn perl in an environment that gives immediate feedback. It's useful for accompanying a perl tutorial, or testing out a bit of code before writing it into your actual source code, or even debugging something that's been giving you problems.

There are a couple things to keep in mind which might cause some confusion if they aren't known of beforehand. Go ahead and throw all your good 'use strict' behavior out the window because scope is broken after every line; meaning that if you type...

 DB<1> my $variable = 50;

$variable will be unavailable on subsequent lines. You'll also have to learn how to squeeze nested control structures onto the same line.

Regular Expressions

Perl also has some of the best built in regular expression tools, allowing the user to easily iterate over any string and compare it with the pattern. A string comparison in Perl looks something like this:

"$string" =~ /<regex>/;

This statement will evaluate as true if the string in "$string" matches the pattern in <regex>, and false otherwise.

A substitution looks very similar:

"$string" =~ s/<regex for replacing>/<replacement text>/g;

This statement will replace the regex in the first // with the string in the second //. The g at the end tells it to substitute all instances of the regex. If the g is omitted, Perl will only perform the first matched substitution.

In addition to allowing simple string comparisons Perl comes with some familiar command line tools, such as tr, string substitution (as above), and it's own grep. For instance, here is an example of how you can use perl to look through a file and pull out formatted data:

#! /usr/bin/perl -w
use strict;
 
#First we need a file handle on the file we'll be pulling the data from
open(MYFILE,"< my_data.txt");     #MYFILE is the file handle, and we are opening the file data.txt in read only mode (<)
 
my @ip_addresses = ();            #This is the array where we will store our data
 
#Let's say that data in the file is in the following format: <RANDOM JUNK> <ip address> <more junk> where ip addresses don't span multiple lines
#and we want to extract any ip addresses in the file.
for my $line (<MYFILE>){                                        #This is one way to do a line by line search
     if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ){     #This regex will match a series of four b/w 1 and 3 digits sections separated by
           print "Found the ip address: $1\n";                  #periods. The parenthesis denote the group to be signified by $1(the whole thing).
           push(@ip_addresses,$1);                              #Adds the value of the ip_address stored in $1 to the array ip_addresses
     }
}
Do data manipulation on your array @ip_addresses...

Grouping of patterns within the match can give you more control over what you want to extract. For example, if we only wanted the addresses from a specific portion of an ip address range in the above example:

#...(same as before)
if( $line =~ /(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/ ) {               #Match an IP address
      print "Reversed Host: $4.$3.$2.$1\n";                                 #Print the address in reverse order!
...

The parentheses around the separate patterns then allow that information to be specifically manipulated when there is a match for the ENTIRE pattern.

Getting comfortable with the pattern syntax and behavior is what takes the most practice. Always be careful with your pattern matching symbol choice ( especially . ) since Perl is a greedy pattern matcher (it will match as much of the line as it can with the given pattern), it is usually safer to use the \S+ (non-whitespace) than the (.) (anything) when generating your pattern. Here is a good reference for regular expression symbols: [2] --MF


Tips & Tricks How to write Simple Perl Bot

Personal tools
Namespaces

Variants
Actions
Navigation
Indexes
SHellium Sites
Toolbox