The source text; Making decisions about lines

How do words mean when we put them into new contexts? Under what conditions does the meaning web tear apart? What meanings can words make (or can we make of them) when we disturb their normal relation to each other? These are questions that any poet can be seen as asking; and from this point of view, someone experimenting with computer poetry is continuing an age-old project.—Charles Hartman, Virtual Muse, p. 104.

A brief tour of the literature

  • Nick Montfort, Taroko Gorge (limited syntax, words as units, carefully curated choice of words)
  • Eric Scovel, A Light Heart, Its Black Thoughts (generative procedure, derived from literature)
  • Leonard Richardson, Dog Bites Dog (web-based, depends on genre-specific conventions, humorous)
  • K. Silem Mohammad, Sonnagrams (anagrams, formal constraint but not composed with an algorithm)
  • Stribling, Krohn, Aguayo, SCIgen (context-free grammar, parody of genre)
  • Charles Bernstein, 1-100 (algorithms, numbers, performance)

For more, check out the side bar on the syllabus.

Your workflow

Using an IDE isn’t out of the question for doing command-line Python programming, but you might find it a bit inconvenient—the likes of Eclipse and IDLE are overkill for what we’re doing. Here are a few possible ways to arrange your workflow:

  • Use a text editor that works from the command line. I like vim, but I don’t recommend it for beginners. (There’s a steep learning curve.) Most UNIX-like machines—including Macs—are equipped with nano, which is a basic yet functional editor that should work for editing simple Python programs.
  • If you’re working on a remote server, you could edit files locally on your computer, and use an SFTP client to synch files with the remote server. If you’re on a Mac, TextWrangler + CyberDuck is a good combination for this. If you’re using this method, you should have an SSH session logged in to the server as well, so you can run the programs after you’ve written them.
  • If you’re working on your local machine and you don’t want to use a command-line editor, just use the text editor of your choice and have a Terminal window open in the directory where your files are located. When you save the file in your editor, switch over to your terminal window to run it.

Finding and preprocessing text

Text is easy to find on the Internet. Project Gutenberg is a good source for text that is reasonably ensured to be free of significant copyright encumbrances. I don’t care where you get your text, but I leave the responsibility for clearing up any usage rights (where necessary) up to you.

The programs that we will be writing in this class operate on plain text files. Word files, PDFs, RTFs, and other binary formats won’t work. Text-based formats like HTML will work, but you’ll be working with the source code, not necessarily just the text contained therein.

There are a couple of strategies available for making something plain text. In a word processor like Word or OpenOffice, you can just save the file as plain text (usually available in the “Save As…” or “Export…” dialogs). For other programs, you can just cut-and-paste the files into a program like TextWrangler, which will automatically convert the file to plain text. Just save the file somewhere your Python scripts can find them, and you’re ready to go. (We’ll go over an example of this in class.)

Python: The Interactive Interpreter

We’ll start our investigation of Python with the interactive interpreter, which allows you to type in Python expressions, statements, and programs and see their output as you type. To access the interactive interpreter, simply type

python

on the command line. You’ll get a few lines of version information, and then a prompt that looks like this: >>>. Now you can type in Python code, and the program will interpret and evaluate your input. Try some simple arithmetic:

>>> 9 + 5
14
>>> 42 + (3 * 6)
60
>>> 9 / 4.0
2.25

(The syntax for arithmetic and order of operations work as you’d expect if you know any C-like language—Perl, Java, C++, JavaScript, etc.)

You can also assign values to variables with the assignment operator =. The name of the variable will then refer to whatever value you assigned to it, until you quit the interpreter (or until you assign another value to it). Create some variables and use them in simple expressions:

>>> foo = 9
>>> bar = 5
>>> baz = 1.1
>>> foo + bar - baz
12.9

(Variable names can contain letters, numbers, and underscores, but must begin with a letter or underscore. Some variable names are reserved by Python; here’s the full list.)

You can assign a string to a variable using quotes (double quotes and single quotes do the same thing):

>>> message = "python"
>>> message
'python'
>>> message + " is for lovers"
'python is for lovers'

Note that you don’t have to give each variable a specific type when you declare it, but once it’s declared, you have to stick with the type that Python inferred. If you try to combine types in unexpected ways, Python will get angry with you.

>>> message + foo
Traceback (most recent call last):
  File "", line 1, in 
TypeError: cannot concatenate 'str' and 'int' objects

Quitting the interactive interpreter

To quit the interactive interpreter, type exit() at the prompt, or hit Ctrl+D.

String literals

When you use quotes to put the value of a string directly into your program, you’re using a string literal. You’ll usually be using either single or double quotes, which behave (for all intents and purposes) exactly the same:

>>> "double quotes"
'double quotes'
>>> 'single quotes'
'single quotes'

There are several special characters that you can include inside your string by using escape characters:

  • \' becomes a literal single quote (for including inside single-quoted string)
  • \" becomes a literal double quote
  • \n becomes a new line character
  • \t becomes a tab character

Everything is an object

In Python, everything is an object—including integers, floating-point variables, strings, etc. You can ask Python what type a variable is like so:

>>> type(foo)
<type 'int'>
>>> type(baz)
<type 'float'>
>>> type(message)
<type 'str'>

Objects in Python have attributes and methods. An attribute is some value associated with an object; a method is some kind of behavior that the object supports.

Warning: awful example time. A “cat” object might have an attribute color that represents its color; it might have a method meow that lets you tell the cat to meow, or a method reproduce that tells the cat to make another cat.

Python lets you look inside of any object to see what methods and attributes it supports. You can do this right from the interpreter, like so:

>>> dir(message)
[... 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', ...]

What you’re seeing is a list of attributes that strings have, and methods that you can call on them. Some of these are mysterious (those beginning and ending with double underscores); others are straightforward. If you want to know more about these methods, you can use “help”:

>>> help(message.upper)
Help on built-in function upper:
 
upper(...)
   S.upper() -> string
    
   Return a copy of the string S converted to uppercase.

The syntax to call a method on an object should look familiar to Java programmers. It looks like this: object.method() (replace object with an object variable, and method with the name of the method.) Some methods accept arguments, which are separated with commas (again, just like in Java). An example:

>>> message.upper()
'PYTHON'
>>> message.center(24, '*')
'*********python*********'

Consult the Python documentation for more methods of string objects.

Slicing strings

Python has a powerful syntax for indexing parts of strings. (You use the same syntax for lists, which we’ll talk about next week.) Some examples:

>>> message = "bungalow"
>>> message[3]
'g'
>>> message[1:6]
'ungal'
>>> message[:3]
'bun'
>>> message[2:]
'ngalow'
>>> message[-2]
'o'
>>> message[:]
'bungalow'

Use the built-in method len (short for “length”) to determine the length of the string:

>>> len(message)
8

Or the in keyword to check if a particular character occurs within a string:

>>> 'a' in message
True
>>> 'x' in message 
False

… or to check to see if a particular substring occurs in a string:

>>> 'foo' in 'food'
True
>>> 'foo' in 'horatio'
False

Built-in functions

Everything in Python is an object, it’s true—and therefore your programs will consist mainly of creating objects, manipulating their properties and calling their methods. Python comes with a rich library of objects for nearly every programming need, and eventually we’ll learn how to create new types of objects ourselves.) But there are some important Python functions that aren’t object methods. We’ve already saw len() and type() above. Here are some more: (you can see the full list here)

  • str(x), int(x), float(x): use these functions to try to convert a value x to the given type
  • abs(x) returns the absolute value of x
  • chr(x) returns the ASCII value of the character x
  • hasattr(x, attr) tests to see whether x has the attribute named in attr as an attribute
  • ord(x) returns the numerical value for the string of length one passed as x, according to its Unicode value (e.g., ‘a’ is 97, ‘b’ is 98, etc.).
  • pow(x, y) returns the value of x raised to the power y
  • raw_input(prompt) prompts the user for a line of input, giving the string in prompt as a prompt

Booleans and comparisons

Like any good programming language, Python lets you check to see if certain conditions obtain: whether something is true or false. Python represents truth with two values, True and False, two built-in values of type bool. We saw these values in the example with the in keyword above. Python additionally supports all of the comparison operators you’re familiar with from other languages, all of which compare two values and evaluate to either True or False:

>>> 4 > 3
True
>>> 5.6 < 90
True
>>> 1700 >= 1699
True
>>> 0.134 <= 1
True
>>> 5 == 5
True
>>> 7 != 900
True
>>> 5 == "5"
False

Take note especially of that last comparison: a number and a string that happens to contain that number are not equal in Python. (This is one important way in which Python and languages like Perl and PHP differ.)

Your first Python program

The interactive interpreter is great for testing out code and doing quick calculations, but there comes a time when a programmer has to actually make a program. Here’s our very first program. Open up a file named cat.py and cut-and-paste (or type) this in:

# our very first program
import sys
for line in sys.stdin:
	line = line.strip()
	print line

This program is a simple clone of the UNIX cat command. To run it, type this at the command line:

python cat.py <some_input_file.txt

(where some_input_file.txt is a file with some text in it.) You should see the contents of your input file printed out to the screen.

Let’s go over this program line-by-line.

1. # our very first program

Comments in Python start with a # and end with the end of the line. Python will ignore any # outside a string, and all text up until the end of the line.

2. import sys

The import keyword tells Python to load up and use some piece of external code, such as a part of the Python library. When we say import sys, we’re telling Python to load the sys module and make it available in our program under the name sys. (The sys module contains a number of helpful objects and functions for working with system-specific features—like reading from standard input.)

3. for line in sys.stdin:

This line starts the beginning of a for loop. The for loop in Python “iterates over” an object. (What this means, exactly, is dependent on the object. More next week when we discuss lists.) In this case we’re iterating over sys.stdin, which is an object that represents UNIX standard input. The code indented under this line of code will be executed for every line in the input, with the variable line having the value of the current line of input.

4. line = line.strip()

Here we call the line object’s strip() method, which strips all whitespace from the beginning and end of the string. We assign the results of this back to the line variable.

5. print line

The print function takes whatever you pass to it and prints it to standard output, followed by a newline character. (The newline character is why we needed to strip off the whitespace in line 4; otherwise, we’d get an extra blank line between lines.)

Python syntax in a nutshell

  • Lines that begin with # are comments. (Python will ignore anything from # to the end of the line.)
  • You don’t need to put a semicolon after every statement; you just put one statement on each line of the file.
  • Code blocks are indented—everything at the same indent level is part of the same block.
  • Long lists can be broken up over multiple lines; use a backslash (\) to break long statements across multiple lines

Syntax of the for loop:

 for tempvar in listlikething:
   statements

… will execute statements for each element present in listlikething. Inside of the loop, tempvar is a variable that you can use to access the value of the current element of the list.

Syntax of if/elif/else:

 if expression1:
   statements
 elif expression2:
   statements
 else:
   statements

In the example above, expression1 and expression2 must be expressions that evaluate to true or false. You can have as many elif blocks as you want, and both the elif and else blocks are optional.

You can join together multiple conditions with boolean operators and and or:

  if x > 4 or x < 10:
    print "x is greater than four or less than ten"

Making decisions about lines

We've seen how to make a version of UNIX cat. Here's simplegrep.py, which is program that works a lot like UNIX grep.

import sys

searchstr = "you"

for line in sys.stdin:
  line = line.strip()
  if searchstr in line:
    print line

Here we make a variable searchstr, which contains the string that we're going to look for in the source file. On line 7, we use the in operator to check to see if searchstr occurs in the current line of input. If the substring is found, we print out the line. If not, we do nothing, and move on to the next line of input.

Mini-exercise. The simplegrep.py example will print lines in the input matching only lowercase you. Make a version of the program that will also print out the line if it matches YOU and You as well. (Try to do this without declaring more variables.)

Another mini-exercise. Make a version of simplegrep.py that only prints out those lines in the input that have a certain length. (Remember: you can use the len function to determine the length of a string.)

Finding substrings

We can use the in operator to check to see if a given substring occurs in a string. But sometimes we want to know not just whether a string occurred, but where it occured. In order to accomplish this, we can use the find method of the string class. Here's a transcript from a session with the interactive interpreter to demonstrate:

>>> foo = "mother said there'd be days like these"
>>> "day" in foo
True
>>> foo.find("day")
23
>>> foo.find("night")
-1

As you can see, a call to the find method returns the position of a substring within another string---the substring's index. If the substring can't be found at all, the method returns -1.

The following example, forfinder.py, uses the find method to print out only those lines that contain the string for , and only prints the portion of the line including and directly following that string. Here's the source code:

import sys

for line in sys.stdin:
  line = line.strip()
  offset = line.find(" for ")
  if offset != -1:
    print line[offset+1:]

Structurally, this example is similar to the last two examples. We're iterating over every line in the input, then making a decision about whether or not to print out that line. On line 5, we call the find method and assign its result to a variable offset. If offset is not equal to -1---in other words, if we found the substring---then we print out the line. But not all of the line: just the part of the line starting from the character after the position of the substring we found. Given an input like sonnets.txt, we might expect output like this:

for thy self to breed another thee,
for one;
for thou art much too fair
for fear to wet a widow's eye,
for still the world enjoys it;
for thy self art so unprovident.
for love of me,
for store,
for her seal, and meant thereby,
for love of you,
for a woman wert thou first created;
for women's pleasure,
for ornament doth use
...

Mini-exercise. In the example above, we're searching for the substring " for "---with a space before and after the word "for." This is because we only want to match the word "for," and not instances when the sequence of characters "for" occurs in the middle of a word. When we discuss regular expressions, we'll be able to make this search more fine-grained, but for now, try the following. Make a version of forfinder.py that prints out the first fifteen characters of each line that begins with the word "And."

Replacements

Another helpful method of strings: replace, which takes two arguments. It will find every instance of the first string and replace with with the second string. Here's a transcript from an interactive interpreter session:

>>> foo = "mother said there'd be days like these"
>>> foo.replace("days", "hallucinations")
"mother said there'd be hallucinations like these"
>>> foo.replace("said", "solemnly chanted")
"mother solemnly chanted there'd be days like these"
>>> foo.replace("e", "eeeee")
"motheeeeer said theeeeereeeee'd beeeee days likeeeee theeeeeseeeee"

One thing to remember about replace is that it doesn't modify the string you call it on---as you can see from the transcript above, foo retains its original value. Instead, replace returns a copy of the string with the replacement applied. If you want the replacements to apply to the original string, you need to assign the result of replace back to the variable you're performing the replacement on. The example cowboy.py demonstrates:

import sys
for line in sys.stdin:
	line = line.strip()
	line = line.replace("the ", "that dadgum ")
	line = line.replace("and ", "and, tell you what, ")
	line = line.replace(" a ", " a doggone ")
	line = line.replace(".", ". Boy, howdy.")
	line = line.replace("!", ", hoooo-weeee!")
	print line

This program takes some input text and renders it more cowboy-like. Here's the output of the program, given frost.txt as an input:

Two roads diverged in a doggone yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in that dadgum undergrowth;

Then took that dadgum other, as just as fair,
And having perhaps that dadgum better claim,
Because it was grassy and, tell you what, wanted wear;
Though as for that that dadgum passing there
Had worn them really about that dadgum same,

And both that morning equally lay
In leaves no step had trodden black. Boy, howdy.
Oh, I kept that dadgum first for another day, hoooo-weeee!
Yet knowing how way leads on to way,
I doubted if I should ever come back. Boy, howdy.

I shall be telling this with a doggone sigh
Somewhere ages and, tell you what, ages hence:
Two roads diverged in a doggone wood, and, tell you what, I—
I took that dadgum one less travelled by,
And that has made all that dadgum difference. Boy, howdy.

Helpful reading

Reply