Perl

IntroductionUp

Perl is a high-level, general-purpose, interpreted, dynamic programming language. It is the language that most of the PanLex tools are written in. You must have Perl version 5.18 or higher to use the PanLex tools.

Installation

Follow the instructions below for the operating system you are using. Then follow the instructions below for installing Perl modules.

Mac OS X

Already installed?

You may already have Perl installed on your computer. To check this or to see which version it is, open a command line and run perl -h to get a list of basic Perl commands. You will see that perl -v displays the current version of Perl. Enter that command to determine what version, if any, is installed. If none is, you will get a “command not found” error.

Command line tools

In order to install some Perl modules (those containing compiled code) or to install Perl with Perlbrew (see below), you first need to download and install the Command Line Tools for Xcode. This is an approximately 150MB download and requires an Apple ID. To determine if these tools are already installed, open a terminal window and at the command line type gcc (and hit return). If the result is command not found, you need to install the tools; if it is no input files, they are already installed. To initiate installation, use the command:

xcode-select --install

Installation with Perlbrew

If you need to install Perl or to update it to a more current version, you can use Perlbrew. Perlbrew is a script that will download, compile, and install any version of Perl. Steps:

  1. To install Perlbrew, open a terminal window and at the command line run curl -kL http://install.perlbrew.pl | bash. Perlbrew will install itself into the perl5 directory in your home directory.
  2. Add Perlbrew’s startup script to your .bash_profile by running echo source ~/perl5/perlbrew/etc/bashrc >> ~/.bash_profile.
  3. Close the terminal window and open a new one to initialize Perlbrew.
  4. Run perlbrew install stable to compile and install the latest stable version of Perl with Perlbrew. This can take about an hour. A large log file is generated, and the command returns instructions on how to view the log file while it is being generated. At the end, you get a report on the success or failure of the installation. If it failed, there are suggestions on what to do. If compilation succeeded but testing failed, the very end of the log file reports how many tests failed. Each failure in the log file contains the string “Failed test”.
  5. Run perlbrew switch <version>, filling in the Perl version number that was installed, to switch your perl command from your system Perl to the one installed with Perlbrew.

Windows

The best way to create a consistent development environment for using the PanLex tools on Windows is to install Cygwin. To do so, download and run the executable setup application from Cygwin’s website. Try the 64-bit version of Cygwin first; if there are any issues, you can fall back to the 32-bit version. Click through the prompts, using the default values (any mirror site should be fine). When prompted for which packages to install, be sure to select the following packages in addition to the defaults:

  • Archive: zip, unzip
  • Devel: gcc-coregcc-g++, gitmake
  • Libs: libcrypt-devel
  • Net: curl, openssl-devel
  • Perl: perl

After selecting these packages, continue the installation. The installer will download and install the packages. To install or update packages in the future, just re-run the setup application.

You can now access Perl from the Cygwin Terminal application. Note that your Cygwin home directory is located at C:\cygwin64\home\<username> in Windows (where <username> should be replaced with your actual username). You may want to create a shortcut to it on your Windows desktop. Likewise, you may want to create a symbolic link to your Windows desktop from your Cygwin home directory. You may do so by opening Cygwin Terminal and issuing the following command:

ln -s /cygdrive/c/Users/<username>/Desktop ~/Desktop

You should replace <username> with your actual username. Once you have created the symbolic link, you should be able to change from your Cygwin home directory to your Windows desktop directory with cd Desktop.

Linux/Unix

Install the packages for Perl with your distribution’s package manager, or use Perlbrew (see above under Mac OS X).

Usage

Guidance is available from a general introduction to Perl. Help is available on using Perl functions, as well. There is also a Perl reference chart conveniently summarizing the language.

Special advice for writing your own custom Perl scripts for tabularization is also available.

Learning

You may get practice writing scripts in Perl by trying the following exercises.

  1. Store a string containing a sentence in a variable. Convert the sentence to all uppercase letters, and print it on the console, followed by a newline.
  2. Store a string containing a sentence in a variable. Split the string into an array, so that each word is an element of the array. Print the first element of the array on the console, followed by a newline.
  3. Modify the code in 2 to loop through the array and print each word to the console, followed by a newline.
  4. Paste some multiline text into a text file. Read the text file line by line, printing each line to the console.
  5. Modify the code in 4 to only print the lines that contain a capital letter.
  6. Modify the code in 4 to sort the words in each line alphabetically before you print them. You can do this with the functions split, sort, and join.
  7. Read a multiline text file line by line, keeping track of how many times each word occurs in the file. Then, report the statistics by printing a line for each word, containing the word and the number of times it occurs in the file (i.e., a series of lines of the form “fish 5”, “dog 11”, etc.).
  8. Modify the code in 7 to sort the output lines by word.
  9. Modify the code in 7 to sort the output lines by frequency, with the most frequent words coming first.