How do words mean when we put them into new contexts? Under what conditions does the meaning web tear apart? What meanings can words make (or can we make of them) when we disturb their normal relation to each other? These are questions that any poet can be seen as asking; and from this point of view, someone experimenting with computer poetry is continuing an age-old project.—Charles Hartman, Virtual Muse, p. 104.
A brief tour of the literature
- Nick Montfort, Taroko Gorge (limited syntax, words as units, carefully curated choice of words)
- Eric Scovel, A Light Heart, Its Black Thoughts (generative procedure, derived from literature)
- Leonard Richardson, Dog Bites Dog (web-based, depends on genre-specific conventions, humorous)
- K. Silem Mohammad, Sonnagrams (anagrams, formal constraint but not composed with an algorithm)
- Stribling, Krohn, Aguayo, SCIgen (context-free grammar, parody of genre)
- Charles Bernstein, 1-100 (algorithms, numbers, performance)
For more, check out the side bar on the syllabus.
Your workflow
Using an IDE isn’t out of the question for doing command-line Python programming, but you might find it a bit inconvenient—the likes of Eclipse and IDLE are overkill for what we’re doing. Here are a few possible ways to arrange your workflow:
- Use a text editor that works from the command line. I like vim, but I don’t recommend it for beginners. (There’s a steep learning curve.) Most UNIX-like machines—including Macs—are equipped with nano, which is a basic yet functional editor that should work for editing simple Python programs.
- If you’re working on a remote server, you could edit files locally on your computer, and use an SFTP client to synch files with the remote server. If you’re on a Mac, TextWrangler + CyberDuck is a good combination for this. If you’re using this method, you should have an SSH session logged in to the server as well, so you can run the programs after you’ve written them.
- If you’re working on your local machine and you don’t want to use a command-line editor, just use the text editor of your choice and have a Terminal window open in the directory where your files are located. When you save the file in your editor, switch over to your terminal window to run it.
Finding and preprocessing text
Text is easy to find on the Internet. Project Gutenberg is a good source for text that is reasonably ensured to be free of significant copyright encumbrances. I don’t care where you get your text, but I leave the responsibility for clearing up any usage rights (where necessary) up to you.
The programs that we will be writing in this class operate on plain text files. Word files, PDFs, RTFs, and other binary formats won’t work. Text-based formats like HTML will work, but you’ll be working with the source code, not necessarily just the text contained therein.
There are a couple of strategies available for making something plain text. In a word processor like Word or OpenOffice, you can just save the file as plain text (usually available in the “Save As…” or “Export…” dialogs). For other programs, you can just cut-and-paste the files into a program like TextWrangler, which will automatically convert the file to plain text. Just save the file somewhere your Python scripts can find them, and you’re ready to go. (We’ll go over an example of this in class.)
Python: The Interactive Interpreter
We’ll start our investigation of Python with the interactive interpreter, which allows you to type in Python expressions, statements, and programs and see their output as you type. To access the interactive interpreter, simply type
python
on the command line. You’ll get a few lines of version information, and then a prompt that looks like this: >>>
. Now you can type in Python code, and the program will interpret and evaluate your input. Try some simple arithmetic:
>>> 9 + 5 14 >>> 42 + (3 * 6) 60 >>> 9 / 4.0 2.25
(The syntax for arithmetic and order of operations work as you’d expect if you know any C-like language—Perl, Java, C++, JavaScript, etc.)
You can also assign values to variables with the assignment operator =
. The name of the variable will then refer to whatever value you assigned to it, until you quit the interpreter (or until you assign another value to it). Create some variables and use them in simple expressions:
>>> foo = 9 >>> bar = 5 >>> baz = 1.1 >>> foo + bar - baz 12.9
(Variable names can contain letters, numbers, and underscores, but must begin with a letter or underscore. Some variable names are reserved by Python; here’s the full list.)
You can assign a string to a variable using quotes (double quotes and single quotes do the same thing):
>>> message = "python" >>> message 'python' >>> message + " is for lovers" 'python is for lovers'
Note that you don’t have to give each variable a specific type when you declare it, but once it’s declared, you have to stick with the type that Python inferred. If you try to combine types in unexpected ways, Python will get angry with you.
>>> message + foo Traceback (most recent call last): File "", line 1, in TypeError: cannot concatenate 'str' and 'int' objects
Quitting the interactive interpreter
To quit the interactive interpreter, type exit()
at the prompt, or hit Ctrl+D.
String literals
When you use quotes to put the value of a string directly into your program, you’re using a string literal. You’ll usually be using either single or double quotes, which behave (for all intents and purposes) exactly the same:
>>> "double quotes" 'double quotes' >>> 'single quotes' 'single quotes'
There are several special characters that you can include inside your string by using escape characters:
\'
becomes a literal single quote (for including inside single-quoted string)\"
becomes a literal double quote\n
becomes a new line character\t
becomes a tab character
Everything is an object
In Python, everything is an object—including integers, floating-point variables, strings, etc. You can ask Python what type a variable is like so:
>>> type(foo) <type 'int'> >>> type(baz) <type 'float'> >>> type(message) <type 'str'>
Objects in Python have attributes and methods. An attribute is some value associated with an object; a method is some kind of behavior that the object supports.
Warning: awful example time. A “cat” object might have an attribute color
that represents its color; it might have a method meow
that lets you tell the cat to meow, or a method reproduce
that tells the cat to make another cat.
Python lets you look inside of any object to see what methods and attributes it supports. You can do this right from the interpreter, like so:
>>> dir(message) [... 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', ...]
What you’re seeing is a list of attributes that strings have, and methods that you can call on them. Some of these are mysterious (those beginning and ending with double underscores); others are straightforward. If you want to know more about these methods, you can use “help”:
>>> help(message.upper) Help on built-in function upper: upper(...) S.upper() -> string Return a copy of the string S converted to uppercase.
The syntax to call a method on an object should look familiar to Java programmers. It looks like this: object.method()
(replace object
with an object variable, and method
with the name of the method.) Some methods accept arguments, which are separated with commas (again, just like in Java). An example:
>>> message.upper() 'PYTHON' >>> message.center(24, '*') '*********python*********'
Consult the Python documentation for more methods of string objects.
Slicing strings
Python has a powerful syntax for indexing parts of strings. (You use the same syntax for lists, which we’ll talk about next week.) Some examples:
>>> message = "bungalow" >>> message[3] 'g' >>> message[1:6] 'ungal' >>> message[:3] 'bun' >>> message[2:] 'ngalow' >>> message[-2] 'o' >>> message[:] 'bungalow'
Use the built-in method len (short for “length”) to determine the length of the string:
>>> len(message) 8
Or the in keyword to check if a particular character occurs within a string:
>>> 'a' in message True >>> 'x' in message False
… or to check to see if a particular substring occurs in a string:
>>> 'foo' in 'food' True >>> 'foo' in 'horatio' False
Built-in functions
Everything in Python is an object, it’s true—and therefore your programs will consist mainly of creating objects, manipulating their properties and calling their methods. Python comes with a rich library of objects for nearly every programming need, and eventually we’ll learn how to create new types of objects ourselves.) But there are some important Python functions that aren’t object methods. We’ve already saw len()
and type()
above. Here are some more: (you can see the full list here)
str(x)
,int(x)
,float(x)
: use these functions to try to convert a valuex
to the given typeabs(x)
returns the absolute value ofx
chr(x)
returns the ASCII value of the characterx
hasattr(x, attr)
tests to see whetherx
has the attribute named inattr
as an attributeord(x)
returns the numerical value for the string of length one passed asx
, according to its Unicode value (e.g., ‘a’ is 97, ‘b’ is 98, etc.).pow(x, y)
returns the value ofx
raised to the powery
raw_input(prompt)
prompts the user for a line of input, giving the string inprompt
as a prompt
Booleans and comparisons
Like any good programming language, Python lets you check to see if certain conditions obtain: whether something is true or false. Python represents truth with two values, True
and False
, two built-in values of type bool
. We saw these values in the example with the in
keyword above. Python additionally supports all of the comparison operators you’re familiar with from other languages, all of which compare two values and evaluate to either True
or False
:
>>> 4 > 3 True >>> 5.6 < 90 True >>> 1700 >= 1699 True >>> 0.134 <= 1 True >>> 5 == 5 True >>> 7 != 900 True >>> 5 == "5" False
Take note especially of that last comparison: a number and a string that happens to contain that number are not equal in Python. (This is one important way in which Python and languages like Perl and PHP differ.)
Your first Python program
The interactive interpreter is great for testing out code and doing quick calculations, but there comes a time when a programmer has to actually make a program. Here’s our very first program. Open up a file named cat.py
and cut-and-paste (or type) this in:
# our very first program import sys for line in sys.stdin: line = line.strip() print line
This program is a simple clone of the UNIX cat
command. To run it, type this at the command line:
python cat.py <some_input_file.txt
(where some_input_file.txt
is a file with some text in it.) You should see the contents of your input file printed out to the screen.
Let’s go over this program line-by-line.
1. # our very first program
Comments in Python start with a #
and end with the end of the line. Python will ignore any #
outside a string, and all text up until the end of the line.
2. import sys
The import
keyword tells Python to load up and use some piece of external code, such as a part of the Python library. When we say import sys
, we’re telling Python to load the sys
module and make it available in our program under the name sys
. (The sys
module contains a number of helpful objects and functions for working with system-specific features—like reading from standard input.)
3. for line in sys.stdin:
This line starts the beginning of a for
loop. The for
loop in Python “iterates over” an object. (What this means, exactly, is dependent on the object. More next week when we discuss lists.) In this case we’re iterating over sys.stdin
, which is an object that represents UNIX standard input. The code indented under this line of code will be executed for every line in the input, with the variable line
having the value of the current line of input.
4. line = line.strip()
Here we call the line
object’s strip()
method, which strips all whitespace from the beginning and end of the string. We assign the results of this back to the line
variable.
5. print line
The print
function takes whatever you pass to it and prints it to standard output, followed by a newline character. (The newline character is why we needed to strip off the whitespace in line 4; otherwise, we’d get an extra blank line between lines.)
Python syntax in a nutshell
- Lines that begin with # are comments. (Python will ignore anything from # to the end of the line.)
- You don’t need to put a semicolon after every statement; you just put one statement on each line of the file.
- Code blocks are indented—everything at the same indent level is part of the same block.
- Long lists can be broken up over multiple lines; use a backslash (\) to break long statements across multiple lines
Syntax of the for loop:
for tempvar in listlikething: statements
… will execute statements
for each element present in listlikething
. Inside of the loop, tempvar
is a variable that you can use to access the value of the current element of the list.
Syntax of if/elif/else:
if expression1: statements elif expression2: statements else: statements
In the example above, expression1
and expression2
must be expressions that evaluate to true or false. You can have as many elif
blocks as you want, and both the elif
and else
blocks are optional.
You can join together multiple conditions with boolean operators and
and or
:
if x > 4 or x < 10: print "x is greater than four or less than ten"
Making decisions about lines
We've seen how to make a version of UNIX cat
. Here's simplegrep.py
, which is program that works a lot like UNIX grep
.
import sys searchstr = "you" for line in sys.stdin: line = line.strip() if searchstr in line: print line
Here we make a variable searchstr
, which contains the string that we're going to look for in the source file. On line 7, we use the in
operator to check to see if searchstr
occurs in the current line of input. If the substring is found, we print out the line. If not, we do nothing, and move on to the next line of input.
Mini-exercise. The
simplegrep.py
example will print lines in the input matching only lowercaseyou
. Make a version of the program that will also print out the line if it matchesYOU
andYou
as well. (Try to do this without declaring more variables.)
Another mini-exercise. Make a version of
simplegrep.py
that only prints out those lines in the input that have a certain length. (Remember: you can use thelen
function to determine the length of a string.)
Finding substrings
We can use the in
operator to check to see if a given substring occurs in a string. But sometimes we want to know not just whether a string occurred, but where it occured. In order to accomplish this, we can use the find
method of the string class. Here's a transcript from a session with the interactive interpreter to demonstrate:
>>> foo = "mother said there'd be days like these" >>> "day" in foo True >>> foo.find("day") 23 >>> foo.find("night") -1
As you can see, a call to the find
method returns the position of a substring within another string---the substring's index. If the substring can't be found at all, the method returns -1.
The following example, forfinder.py
, uses the find
method to print out only those lines that contain the string for
, and only prints the portion of the line including and directly following that string. Here's the source code:
import sys for line in sys.stdin: line = line.strip() offset = line.find(" for ") if offset != -1: print line[offset+1:]
Structurally, this example is similar to the last two examples. We're iterating over every line in the input, then making a decision about whether or not to print out that line. On line 5, we call the find
method and assign its result to a variable offset
. If offset
is not equal to -1---in other words, if we found the substring---then we print out the line. But not all of the line: just the part of the line starting from the character after the position of the substring we found. Given an input like sonnets.txt
, we might expect output like this:
for thy self to breed another thee, for one; for thou art much too fair for fear to wet a widow's eye, for still the world enjoys it; for thy self art so unprovident. for love of me, for store, for her seal, and meant thereby, for love of you, for a woman wert thou first created; for women's pleasure, for ornament doth use ...
Mini-exercise. In the example above, we're searching for the substring
" for "
---with a space before and after the word "for." This is because we only want to match the word "for," and not instances when the sequence of characters "for" occurs in the middle of a word. When we discuss regular expressions, we'll be able to make this search more fine-grained, but for now, try the following. Make a version offorfinder.py
that prints out the first fifteen characters of each line that begins with the word "And."
Replacements
Another helpful method of strings: replace
, which takes two arguments. It will find every instance of the first string and replace with with the second string. Here's a transcript from an interactive interpreter session:
>>> foo = "mother said there'd be days like these" >>> foo.replace("days", "hallucinations") "mother said there'd be hallucinations like these" >>> foo.replace("said", "solemnly chanted") "mother solemnly chanted there'd be days like these" >>> foo.replace("e", "eeeee") "motheeeeer said theeeeereeeee'd beeeee days likeeeee theeeeeseeeee"
One thing to remember about replace
is that it doesn't modify the string you call it on---as you can see from the transcript above, foo
retains its original value. Instead, replace
returns a copy of the string with the replacement applied. If you want the replacements to apply to the original string, you need to assign the result of replace
back to the variable you're performing the replacement on. The example cowboy.py
demonstrates:
import sys for line in sys.stdin: line = line.strip() line = line.replace("the ", "that dadgum ") line = line.replace("and ", "and, tell you what, ") line = line.replace(" a ", " a doggone ") line = line.replace(".", ". Boy, howdy.") line = line.replace("!", ", hoooo-weeee!") print line
This program takes some input text and renders it more cowboy-like. Here's the output of the program, given frost.txt
as an input:
Two roads diverged in a doggone yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in that dadgum undergrowth; Then took that dadgum other, as just as fair, And having perhaps that dadgum better claim, Because it was grassy and, tell you what, wanted wear; Though as for that that dadgum passing there Had worn them really about that dadgum same, And both that morning equally lay In leaves no step had trodden black. Boy, howdy. Oh, I kept that dadgum first for another day, hoooo-weeee! Yet knowing how way leads on to way, I doubted if I should ever come back. Boy, howdy. I shall be telling this with a doggone sigh Somewhere ages and, tell you what, ages hence: Two roads diverged in a doggone wood, and, tell you what, I— I took that dadgum one less travelled by, And that has made all that dadgum difference. Boy, howdy.
Helpful reading
- Using the Python interpreter, An informal introduction to Python, and More control flow tools. These sections of the official Python tutorial cover a lot of the same material that we covered in class.
- From How to Think Like a Computer Scientist: Chapter 1: The Way of the Program, Chapter 2: Variables, Expressions, and Statements, Chapter 4: Conditionals and Chapter 7: Strings. You may find it helpful to do the exercises.
- Methods of string objects from the official Python documentation.
Reply
You must be logged in to post a comment.
No comments
Comments feed for this article
Trackback link: http://www.decontextualize.com/teaching/rwet/the-source-text-making-decisions-about-lines/trackback/