Getting In Line

What’s this course about?

Programming: Taking your Java skills to the next level, introducing tools and algorithms that will make your life easier (or at least more interesting)
Data: how it’s stored in the computer, how to effectively manipulate it
Specifically, data as text. This is a class about procedural poetics.

What isn’t it about?

Typography and layout. We’re concerned with the bits and bytes, not how they’re presented.
Semantics. We’re concerned with the bits and bytes, not what they mean.
PHP.

The structure of non-interactive computer programs

In ICM, you learned a certain type of programming: interactive programming. The programs you made in that class could be schematically represented thusly:

Do some initialization stuff.
Keep doing something over and over again.
Get some user input.
Go to step 2.

These programs are centered around responding to user actions. But what if we don’t care about that? What if we just want to munge some data?

Most computer programs are non-interactive, meaning that they don’t respond to user input (or respond to it only indirectly). The model for the programs that we’ll be making in this class (up until week 10) is this:

input -> mungeing procedure -> output

We don’t care about how our program was run, or what the user’s doing while the program is running, or even (ideally) where the input is coming from or where the output is going. We’re concerned solely with the procedure: the code that, given data as input, transforms it into output.

Appropriative Poetics: Cut-ups

From a poetic standpoint, we’re doing “appropriative process writing”: writing algorithms that take existing texts and mess them up, recombining them with themselves or other texts, juxtaposing their elements in unexpected (or completely expected) ways. Much of the conceptual material of the course focuses on this idea.

Units: Character, Line, File

Our process, however, needs a unit to operate on. Text can be divided into any number of different, overlapping units (document, page, section, subsection, chapter, clause, sentence, ascender, descender, act, stanza, syllable, foot…) but only some of these are easy for computers to work with. (It’s harder than you think to teach a computer what a “sentence” is, for example.)

The two most obvious units of text in a computer are:

the character, i.e., the byte (or series of bytes) that represents a single element of written language (e.g., A through Z in English, any one of many glyphs in Chinese…)
the file, i.e., an ordered collection of characters

Somewhere in between these two is the line, a formal unit of text that has been part of written language from the beginning. (Cuneiform with lines.) The line arises in written text because writing transcribes speech, which is a one-dimensional medium, onto two-dimensional surfaces (paper, clay, stone…). Line breaks in text are, fundamentally, a way of using up all of the space allotted.

But line breaks also serve syntactic, semantic, and metrical functions, as in poetry:

this is just to say

i have eaten
the plums
that were in
the icebox

and which
you were probably
saving
for breakfast

forgive me
they were delicious
so sweet
and so cold

In computer text, the line is often used as a “record marker.” This is how a text file can be used as a rudimentary database. (Take these NBA stats, for example.) When your Arduino sends a new line character after a group of data, that’s exactly what it’s doing.

Perhaps because of these parallelisms (text layout/poetic structure/database structure), many programs that operate on text use the line as their fundamental unit–especially those in UNIX (coming right up). The programs that we write in this class will do the same.

The UNIX Command Line

In this class, we’ll be doing most of our work from the UNIX command line. It’s not because we’re hard-asses, or bad-asses, or just asses. Some claim that the command line is “better,” or “more fundamental.” It isn’t. But it was built to do the kind of work that we’re doing in this class.

Getting started: Logging In

ITP-specific instructions here. If you’re using OS X, you just need to open Terminal (Applications > Utilities > Terminal). If you’re using Linux, just open your favorite terminal emulator. If you’re on Windows, you’ll need PuTTY.

When you’ve successfully reached the command line (another line!), you should see something like this:

[ap1607@itp ~]$

This is the “prompt” (because it “prompts” you to do something). It’s telling you your username, the server you’re logged into, and the current directory. More on that later.

The two keystrokes you absolutely must know:

Ctrl+D: “I’m done typing things in. kthxbye.”
Ctrl+C: “You’re doing something I don’t like. Please stop.”

Your first UNIX commands

First off, we’re going to create a directory, so that you can find it later and you don’t risk overwriting something:

$ mkdir a2z $ cd a2z

(don’t type the $! That’s just there to indicate that those commands should be typed at the command line.)

The mkdir command means “make directory”–”directory” is just UNIX speak for “folder.” (You can probably find the folder in the Finder right now, if you’re on OS X.) When you’re using the command line, there’s one directory on your machine that is considered your “current” directory, i.e., the directory you’re doing stuff in. The cd (“change directory”) command makes the directory you give it (a2z in this case) the current directory.

There are (broadly) two kinds of commands in UNIX: commands that work on lines of input/output, and commands that operate on files and directories. The mkdir and cd commands are examples of the latter. We’re primarily concerned with the former. Let’s start with cat:

$ cat

(Make sure to hit “return” after you type cat.) Now type. After you enter a line, cat will print out the same line. It’s the simplest text filter possible (one rule: let everything through).

When you’re done with cat, press Ctrl+D. Let’s try something more interesting, like grep:

$ grep foo

Now type some lines of text. Try typing, for example, “I like drink” and then “I like food.” The grep command only prints out lines that “match” the string of characters that follow the command (foo, in this case). Let’s try it again, this time with a different “pattern”:

$ grep were

If we cut and paste the poem above (“This is just to say”) into the terminal application the resulting output would look like this:

that were in you were probably they were delicious

The commands head and tail print out a certain number of lines from the beginning of a file and the end of a file, respectively. If you type in the following:

$ tail -4

… and then paste in the poem above, you’d get:

forgive me they were delicious so sweet and so cold

Structure of UNIX commands

UNIX commands generally follow this structure:

name_of_command [options] arguments

(The “[options]” part of that schema is usually one or more characters preceded by hyphens. The -4 in tail tells it to print the last four lines; grep takes an option, -i, which tells it to be case insensitive.)

You can think of UNIX commands like commands in English, but with a funny syntax: “Fetch thoroughly my slippers!”

You can figure out which options and arguments a command supports by typing man name_of_command at the command line.

Sorting and piping

The sort command takes every line of input and prints them back out, in alphabetical order. Try:

$ sort

… paste in the poem, and hit Ctrl+D. You’ll get something like:

and so cold and which for breakfast forgive me i have eaten saving so sweet that were in the icebox the plums they were delicious this is just to say you were probably

(why the blank lines at the beginning?)

So far, we’ve just been sending these commands input (by typing, or cutting and pasting), then letting the output be printed back to the screen. UNIX provides a means by which we can send the output of one program as the input of another program. (Kind of like hooking objects up in a Max/MSP patch.) We do this using the pipe character (| … usually shift+backslash). For example:

$ grep were | sort

… takes lines from input, displays only those that contain the string “were,” and then passes them to sort, which displays those lines in alphabetical order. The output from the poem:

that were in they were delicious you were probably

cut and tr

The cut command breaks up a line of text into its constituent parts. For example:

$ cut -d ' ' -f 1

… prints out the first word of all lines of input. The cut command takes two options, both of which themselves have parameters. The -d option is followed by the “delimiter” string (i.e., what you want to split the line on); the -f option is followed by which field you want.

The tr command “translates” a set of characters in the original line to another set of characters. The source character set is the first parameter, and the second parameter is the characters you want them to be translated to. For example:

$ tr aeiou eioua hello there, how are you? hillu thiri, huw eri yua?

You can specify a range of characters with a hyphen:

$ tr a-z A-Z hello there, how are you? HELLO THERE, HOW ARE YOU?

Multiple pipes

Of course, you can include more than one command in a “pipeline”:

$ sort | tail -6 | tr aeiou e

… which, if you send it our venerable poem, outputs the following:

thet were en the ecebex the plems they were deleceees thes es jest te sey yee were prebebly

What happened? The input went to sort, which sorted the lines in alphabetical order. Then tail -6 grabbed only the last six lines of the output of sort, which sent those lines through to tr. (You can build pipelines of infinite length using this technique.)

Using files (“redirection”)

So far, we’ve been building “programs” that can only read from the keyboard (or from cut-and-paste) and can only send their output to the screen. What if we want to read from an existing file, and then output to a file?

No complicated code is needed. UNIX provides a method for us. It’s called “redirection.” Here’s how it works:

$ sort <this_is_just.txt

The < character means “instead of taking input from the keyboard, take input from this file.” Likewise:

$ grep were >some_file.txt

The > means “instead of sending your output to the screen, send it to this file.” You can use them both at the same time:

$ grep were <this_is_just.txt >some_file.txt

… in which case some_file.txt will end up with every line from this_is_just.txt that contains the string “were.” (If the output file doesn’t already exist, it will be created. If it does already exist, it will be overwritten, so be careful!)

Other helpful commands

wc -w foo will print out the number of words in the file named foo. (wc -l will count the number of lines; wc -c will count the number of characters.)

curl -O http://some.url/ downloads the file at the given URL into your current directory.

(We’ll go into more detail with these next week.)

The ls command will give you a list (“ls” stands for “list”) of files in your current directory. If you give it a parameter, it will give you a listing of the files in the directory you gave to it. (On OS X, for example, try ls /Users/your_user_name/Desktop)

Type pwd to find out what your current directory is.

The cp command will make a copy (“cp” for “copy”) of a file. It takes two parameters: the first is the source file name, the second is the destination file name.

Type rm foo to delete the file named foo. (Note: this is permanent! The file won’t go to your Trash… so be careful)

Reading for next week

Glazier, L.P. Grep: A Grammar. In Witz 7.1 (Spring 1999). Examples of Glazier’s “grep works” can be found here.
An Annotated History of Character Codes
(optional) Stephenson, N. In the Beginning was the Command Line.

Assignment #1

Use a combination of the UNIX commands discussed in class (along with any other commands that you discover) to compose a text. Your “source code” for this assignment will simply consist of what you executed on the command line. Indicate what kind of source text the “program” expects, and give an example of what text it generates. Use man to discover command line options that you might not have known about (grep -i is a good one).

2 comments

Comments feed for this article

Trackback link: http://www.decontextualize.com/teaching/a2z/getting-in-line/trackback/

Pingback from Lehrblogger :: Programming A to Z - Assignment #1 on January 26, 2009 at 1:35 am
Pingback from Do Not Read » Blog Archive » A2Z erste assignment - infantile text munging using UNIX code on January 27, 2009 at 12:44 am

decontextualize