Introduction; UNIX; PHP: strings

Introduction: Our Languages, Ourselves

PHP, Perl, and Python. Three programming languages with a lot in common.

They’re high-level languages with dynamic typing, known for being interpreted or compiled to byte-code at the time of execution.

They’re commonly used for building the back-end of web applications—they’re the “P” in LAMP.

They’re used in a tremendous number of software projects, in projects ranging from enterprise-level financial software to systems programming to systems administration to games.

They each offer unique structures and approaches for common programming tasks.

But they’re not usually covered in Computer Science curricula. This course is designed to correct that.

Goals and Expectations

In a course designed to fit three languages into just a handful of sessions, there’s an upper limit to what students can be expected to learn (or what the instructor can be expected to teach!). PHP, Perl and Python aren’t any easier to learn than any other programming languages. Don’t aim for fluency; just try to get the gist. The idea is to get your feet wet in working example code, so that you can hit the ground running when you find yourself with the opportunity to work on a larger project written in one of these languages.

This course is designed for students who already know how to program computers. We’re not going to focus on basic programming concepts.

We’re also not going to focus on the persnickety particulars of syntax. You’re entrusted with the ability to use the official documentation (and Google, and e.g. Stack Overflow) to answer those kinds of questions.

Methodology

You’ll be learning by looking at small, working examples of code. Most of the code is designed to manipulate text files, and runs from the UNIX command line. The example programs will grow in complexity each day; each language will culminate in a more fully-fleshed out program. (A Markov chain generator in PHP, a context-free grammar generator in Perl, and Conway’s Game of Life in Python.)

There are three programming assignments and a final (written) exam.

Versions

We’ll be using PHP 5.2, Perl 5.10, and Python 2.6 in this course. All three of these languages have newer versions, some with significant changes:

  • PHP 5.3 introduces namespaces and closures
  • Perl 6 is a complete overhaul of the language, with new features and idioms
  • Python 3 introduces a number of syntactic changes and re-arrangements of the standard library

The versions we’re learning are slightly less than bleeding edge, but are most widely used in existing software, and are most widely compatible with existing frameworks. (Google App Engine, for example, doesn’t work yet with Python 3, and there is hardly any production code written in Perl 6.)

Getting in (command) line: UNIX

The structure of non-interactive computer programs

Computer programs are made to work in many different ways. To simplify a bit, let’s say that there are two kinds of computer programs: interactive programs and non-interactive programs. If you learned to program in an environment like Processing, then you probably learned interactive programming. Programs made in this paradigm can be schematically represented like this:

  1. Do some initialization stuff.
  2. Keep doing something over and over again.
  3. Get some user input.
  4. Go to step 2.

These programs are centered around responding to user actions. But what if we don’t care about that? What if we just want to munge some data?

Most computer programs are non-interactive, meaning that they don’t respond to user input (or respond to it only indirectly). The model for the programs that we’ll be making in this class is this:

input -> mungeing procedure -> output

We don’t care about how our program was run, or what the user’s doing while the program is running, or even (ideally) where the input is coming from or where the output is going. We’re concerned solely with the procedure: the code that, given data as input, transforms it into output.

The UNIX Command Line

In this class, we’ll be doing most of our work from the UNIX command line. It’s not because we’re hard-asses, or bad-asses, or just asses. Some claim that the command line is “better,” or “more fundamental.” It isn’t. But it was built to do the kind of work that we’re doing in this class.

Getting started: Logging In

If you’re already familiar with UNIX and have access to a UNIX machine, then you’re good to go, and you can skip to the next section.

For the duration of this course, there is a UNIX server (well, Ubuntu Linux) available online for student use. In order to log into it, though, you’ll need an SSH client. (SSH means “secure shell”: it’s a protocol that allows you to log in to remote machines with secure data encryption.)

If you’re using OS X, then you’ve already got a good SSH client. Open Terminal (Applications > Utilities > Terminal); at the prompt, type the following:

ssh your_user_name@sandbox.decontextualize.com

(Replace your_user_name with the first letter of your first name followed by the first seven letters of your last name—for me, that would be aparrish: the “a” from Adam plus all seven letters of my last name.)

You’ll be prompted for a password, which I gave in class. Come see me if you missed it, or if you forgot your password when you changed it in class.

If you’re using Windows, you’ll need to download an SSH client. I recommend PuTTY. (See me if you need more detailed instructions for how to set this up.)

If you’re using Linux, you probably already know the drill. Open your terminal emulator and SSH to the server, using the same command for OS X given above.

When you’ve successfully reached the command line (another line!), you should see something like this:

aparrish@li99-59:~$

This is the “prompt” (because it “prompts” you to do something). It’s telling you your username, the server you’re logged into, and the current directory. More on that later.

The two keystrokes you absolutely must know:

  • Ctrl+D: “I’m done typing things in. kthxbye.”
  • Ctrl+C: “You’re doing something I don’t like. Please stop.”
Your first UNIX commands

First off, we’re going to create a directory, so that you can find it later and you don’t risk overwriting something:

$ mkdir ppp
$ cd ppp

(don’t type the $! That’s just there to indicate that those commands should be typed at the command line.)

The mkdir command means “make directory”–”directory” is just UNIX speak for “folder.” (You can probably find the folder in the Finder right now, if you’re on OS X.) When you’re using the command line, there’s one directory on your machine that is considered your “current” directory, i.e., the directory you’re doing stuff in. The cd (“change directory”) command makes the directory you give it (dwwp in this case) the current directory.

There are (broadly) two kinds of commands in UNIX: commands that work on lines of input/output, and commands that operate on files and directories. The mkdir and cd commands are examples of the latter. We’re primarily concerned with the former. Let’s start with cat:

$ cat

(Make sure to hit “return” after you type cat.) Now type. After you enter a line, cat will print out the same line. It’s the simplest text filter possible (one rule: let everything through).

When you’re done with cat, press Ctrl+D. Let’s try something more interesting, like grep:

$ grep foo

Now type some lines of text. Try typing, for example, “I like drink” and then “I like food.” The grep command only prints out lines that “match” the string of characters that follow the command (foo, in this case). Let’s try it again, this time with a different “pattern”:

$ grep were

Now let’s work with some actual text. Here’s William Carlos Williams’ famous poem, This is just to say:

this is just to say

i have eaten
the plums
that were in
the icebox

and which
you were probably
saving
for breakfast

forgive me
they were delicious
so sweet
and so cold

If we cut and paste this poem into the terminal application the resulting output would look like this:

that were in
you were probably
they were delicious

The commands head and tail print out a certain number of lines from the beginning of a file and the end of a file, respectively. If you type in the following:

$ tail -4

… and then paste in the poem above, you’d get:

forgive me
they were delicious
so sweet
and so cold

Structure of UNIX commands

UNIX commands generally follow this structure:

name_of_command [options] arguments

(The “[options]” part of that schema is usually one or more characters preceded by hyphens. The -4 in tail tells it to print the last four lines; grep takes an option, -i, which tells it to be case insensitive.)

You can think of UNIX commands like commands in English, but with a funny syntax: “Fetch thoroughly my slippers!”

You can figure out which options and arguments a command supports by typing man name_of_command at the command line.

Sorting and piping

The sort command takes every line of input and prints them back out, in alphabetical order. Try:

$ sort

… paste in the poem, and hit Ctrl+D. You’ll get something like:

and so cold
and which
for breakfast
forgive me
i have eaten
saving
so sweet
that were in
the icebox
the plums
they were delicious
this is just to say
you were probably

(why the blank lines at the beginning?)

So far, we’ve just been sending these commands input (by typing, or cutting and pasting), then letting the output be printed back to the screen. UNIX provides a means by which we can send the output of one program as the input of another program. (Kind of like hooking objects up in a Max/MSP patch.) We do this using the pipe character (| … usually shift+backslash). For example:

$ grep were | sort

… takes lines from input, displays only those that contain the string “were,” and then passes them to sort, which displays those lines in alphabetical order. The output from the poem:

that were in
they were delicious
you were probably

cut and tr

The cut command breaks up a line of text into its constituent parts. For example:

$ cut -d ' ' -f 1

… prints out the first word of all lines of input. The cut command takes two options, both of which themselves have parameters. The -d option is followed by the “delimiter” string (i.e., what you want to split the line on); the -f option is followed by which field you want.

The tr command “translates” a set of characters in the original line to another set of characters. The source character set is the first parameter, and the second parameter is the characters you want them to be translated to. For example:

$ tr aeiou eioua
hello there, how are you?
hillu thiri, huw eri yua?

You can specify a range of characters with a hyphen:

$ tr a-z A-Z
hello there, how are you?
HELLO THERE, HOW ARE YOU?

Multiple pipes

Of course, you can include more than one command in a “pipeline”:

$ sort | tail -6 | tr aeiou e

… which, if you send it our venerable poem, outputs the following:

thet were en
the ecebex
the plems
they were deleceees
thes es jest te sey
yee were prebebly

What happened? The input went to sort, which sorted the lines in alphabetical order. Then tail -6 grabbed only the last six lines of the output of sort, which sent those lines through to tr. (You can build pipelines of infinite length using this technique.)

Using files (“redirection”)

So far, we’ve been building “programs” that can only read from the keyboard (or from cut-and-paste) and can only send their output to the screen. What if we want to read from an existing file, and then output to a file?

No complicated code is needed. UNIX provides a method for us. It’s called “redirection.” Here’s how it works:

$ sort <this_is_just.txt

The < character means “instead of taking input from the keyboard, take input from this file.” Likewise:

$ grep were >some_file.txt

The > means “instead of sending your output to the screen, send it to this file.” You can use them both at the same time:

$ grep were <this_is_just.txt >some_file.txt

… in which case some_file.txt will end up with every line from this_is_just.txt that contains the string “were.” (If the output file doesn’t already exist, it will be created. If it does already exist, it will be overwritten, so be careful!)

Other helpful commands

man will print out a manual page for the command you give it on the command-line. The manual page will contain helpful information about how the command works, along with a list of options and parameters it supports. For example, to get more information about the tr command, type man tr.

wc -w foo will print out the number of words in the file named foo. (wc -l will count the number of lines; wc -c will count the number of characters.)

curl -O http://some.url/ downloads the file at the given URL into your current directory.

The ls command will give you a list (“ls” stands for “list”) of files in your current directory. If you give it a parameter, it will give you a listing of the files in the directory you gave to it. (On OS X, for example, try ls /Users/your_user_name/Desktop)

Type pwd to find out what your current directory is.

The cp command will make a copy (“cp” for “copy”) of a file. It takes two parameters: the first is the source file name, the second is the destination file name.

Type rm foo to delete the file named foo. (Note: this is permanent! The file won’t go to your Trash… so be careful)

Your workflow

Using an IDE isn’t out of the question for doing command-line programming, but you might find it a bit inconvenient—the likes of Eclipse is overkill for what we’re doing. Here are a few possible ways to arrange your workflow:

  • Use a text editor that works from the command line. I like vim, but I don’t recommend it for beginners. (There’s a steep learning curve.) The sandbox server is equipped with nano, which is a basic yet functional editor that should work for editing simple Python programs.
  • Edit files locally on your computer, and use an SFTP client to synch files with the remote server. If you’re on a Mac, TextWrangler + CyberDuck is a good combination for this. If you’re using this method, you should have an SSH session logged in to the server as well, so you can run the programs after you’ve written them.

The sandbox server supports SFTP in addition to SSH; you can log in and transfer files with any file transfer program that supports SFTP. (See me if you’re having trouble transferring files.)

Your first PHP

Now that we’re all UNIX experts, let’s get down to business and do some programming. The first program we’re going to write is the simplest possible program that might be of use to anyone: a clone of cat. Here’s the source code (simplecat.php in the PHP examples directory, or cut-and-paste into a file named simplecat.php):

<?php

while ($line = fgets(STDIN)) {
	$line = rtrim($line);
	echo "$line\n";
}

You can run this program by using cd to change to the directory containing the file and then typing this at the command-line:

$ php simplecat.php

Anything you type will be echoed back to the terminal. You can use simplecat.php just like regular cat to read and write files, using redirection:

$ php simplecat.php >foo.txt
this is a test of simple cat
simple cat, simple cat, what have they been feeding to your stdin?
$ php simplecat.php <foo.txt
this is a test of simple cat
simple cat, simple cat, what have they been feeding to your stdin?
$

So, how does the program work? Let’s go over it line-by-line.

Line 1: <?php is the opening PHP tag. Anything after this tag will be interpreted as code, until either the end of the file or a matching ?> tag. The importance of this will become clear when we discuss using PHP as an HTML templating language.

Line 3: Begins a while loop. fgets is a PHP function that returns a line of text from standard input, or null if the end of the file has been reached. $line is a variable that we’re defining here to hold this value. Note that we don’t have to declare the variable before using it, nor do we need to specify its type.

Line 4: rtrim is another built-in PHP function, which takes a single variable as a parameter. It returns a copy of the specified string with whitespace stripped from the right side.

Line 5: echo takes a string parameter and prints it to standard output. You see a string literal here, with double quotes. Double quoted strings are interpolated: if you write a variable inside a double quoted string, PHP will replace the variable’s name with the variable’s value. Also note \n: backslashed escape characters in PHP are similar (but not identical) to escape characters in C strings.

Line 6: } closes the while loop. Just as in C, C++ or Java, multi-line blocks of code (in, e.g., a loop or conditional) are contained within curly brackets.

This structure—read a line from input, do something to the line, and spit it back out—is the basic form that most of the programs we write in this class will take.

Mini exercise 1. What else could you do with the line, instead of just printing it out? Here’s a list of PHP functions relating to strings. Use one or more of these functions to make a program that prints out every line of the input in upper case, or prints out only the first five characters of each line, or some other string manipulation of your choosing.

Further reading and helpful links

Reply