Introduction and UNIX tutorial

The structure of non-interactive computer programs

Computer programs are made to work in many different ways. To simplify a bit, let’s say that there are two kinds of computer programs: interactive programs and non-interactive programs. If you learned to program in an environment like Processing, then you probably learned interactive programming. Programs made in this paradigm can be schematically represented like this:

Do some initialization stuff.
Keep doing something over and over again.
Get some user input.
Go to step 2.

These programs are centered around responding to user actions. But what if we don’t care about that? What if we just want to munge some data?

Most computer programs are non-interactive, meaning that they don’t respond to user input (or respond to it only indirectly). The model for the programs that we’ll be making in this class is this:

input -> mungeing procedure -> output

We don’t care about how our program was run, or what the user’s doing while the program is running, or even (ideally) where the input is coming from or where the output is going. We’re concerned solely with the procedure: the code that, given data as input, transforms it into output.

Reading and Writing Electronic Text

People read texts for all manner of reasons–to extract meaning, for entertainment, for literary or historical analysis, to pass time on the subway. Texts themselves are structured to afford different ways of reading: our method of reading a poem is very different from the way we read a Wikipedia page, which is in turn very different from the way we use a dictionary.

Texts, of course, are also subject to different kinds of readings: readings that subvert the intended purpose of the text. Remixing, collage, literary analysis, text analysis and visualization—these are all examples of ways of “reading” that go against the grain. One of the premises of this course is that reading is an active activity, an activity that takes cunning, wits, and creativity.

Electronic text—text on a computer—has its own set of unique affordances and limitations. One important affordance of electronic text is that it can be manipulated procedurally: we can write computer programs that read the text. This course is about writing those computer programs, and exploring the technical, creative, and critical potential of those programs to give texts interesting readings.

But we’re also interested in the other side of the coin–writing electronic text. It’s possible to read an analog text without leaving a trace, but computer programs almost always leave behind an artifact as a consequence of a reading: munged output, a spreadsheet of values, a visualization. This course is also about those artifacts. We’re attempting to answer questions like these: How can we use computers to compose text? Does it even make sense to think of textual composition as a procedure? What are the qualities of procedurally generated text, and how does it differ (if at all) from text composed in other ways?

Appropriative Poetics: Cut-ups

From a poetic standpoint, we’re doing “appropriative process writing”: writing algorithms that take existing texts and mess them up, recombining them with themselves or other texts, juxtaposing their elements in unexpected (or completely expected) ways. Much of the conceptual material of the course focuses on this idea.

Units: Character, Line, File

Our process, however, needs a unit to operate on. Text can be divided into any number of different, overlapping units (document, page, section, subsection, chapter, clause, sentence, ascender, descender, act, stanza, syllable, foot…) but only some of these are easy for computers to work with. (It’s harder than you think to teach a computer what a “sentence” is, for example.)

The two most obvious units of text in a computer are:

the character, i.e., the byte (or series of bytes) that represents a single element of written language (e.g., A through Z in English, any one of many glyphs in Chinese…)
the file, i.e., an ordered collection of characters

Somewhere in between these two is the line, a formal unit of text that has been part of written language from the beginning. (Cuneiform with lines.) The line arises in written text because writing transcribes speech, which is a one-dimensional medium, onto two-dimensional surfaces (paper, clay, stone…). Line breaks in text are, fundamentally, a way of using up all of the space allotted.

But line breaks also serve syntactic, semantic, and metrical functions, as in poetry:

SEA ROSE

Rose, harsh rose,
marred and with stint of petals,
meagre flower, thin,
spare of leaf,

more precious
than a wet rose
single on a stem —
you are caught in the drift.

Stunted, with small leaf,
you are flung on the sand,
you are lifted
in the crisp sand
that drives in the wind.

Can the spice-rose
drip such acrid fragrance
hardened in a leaf?

This is “Sea Rose” by H.D. (Previously I’d used William Carlos Williams’ “This is just to say” as the main example text for this course, but that poem is still under copyright. Sad.)

In computer text, the line is often used as a “record marker.” This is how a text file can be used as a rudimentary database. (Take these NBA stats, for example.) When your Arduino sends a new line character after a group of data, that’s exactly what it’s doing.

Perhaps because of these parallelisms (text layout/poetic structure/database structure), many programs that operate on text use the line as their fundamental unit–especially those in UNIX (coming right up). The programs that we write in this class will do the same.

The UNIX Command Line

In this class, we’ll be doing most of our work from the UNIX command line. It’s not because we’re hard-asses, or bad-asses, or just asses. Some claim that the command line is “better,” or “more fundamental.” It isn’t. But it was built to do the kind of work that we’re doing in this class.

Getting started: Logging In

If you’re already familiar with UNIX and have access to a UNIX machine, then you’re good to go, and you can skip to the next section. If you’re using OSX, congratulations! You are running a UNIX machine. Simply open up Terminal (Applications > Utilities > Terminal), and you’re at the command line. Same goes for Linux; if you’re running Linux, you probably already know how to get to the command line. If not, the instructions vary by distribution. (Come see me if you’re having trouble.)

If you aren’t using a machine that already has a UNIX command line, or if you’d prefer to use a different machine for your course work, you can use a third-party service to get a “shell” account, or run a virtual server in the cloud. If you use DreamHost, you may already have a shell account. You might try EC2, Linode or Webfaction. Digital Ocean has a good tutorial on how to set up a virtual Linux server on their service.

When you’ve successfully reached the command line (another line!), you should see something like this:

-bash-4.1$

This is the “prompt” (because it “prompts” you to do something). It’s telling you your username, the server you’re logged into, and the current directory. More on that later.

The two keystrokes you absolutely must know:

Ctrl+D: “I’m done typing things in. kthxbye.”
Ctrl+C: “You’re doing something I don’t like. Please stop.”

Your first UNIX commands

First off, we’re going to create a directory, so that you can find it later and you don’t risk overwriting something:

$ mkdir rwet $ cd rwet

(don’t type the $! That’s just there to indicate that those commands should be typed at the command line.)

The mkdir command means “make directory”–”directory” is just UNIX speak for “folder.” (You can probably find the folder in the Finder right now, if you’re on OS X.) When you’re using the command line, there’s one directory on your machine that is considered your “current” directory, i.e., the directory you’re doing stuff in. The cd (“change directory”) command makes the directory you give it (rwet in this case) the current directory.

There are (broadly) two kinds of commands in UNIX: commands that work on lines of input/output, and commands that operate on files and directories. The mkdir and cd commands are examples of the latter. We’re primarily concerned with the former. Let’s start with cat:

$ cat

(Make sure to hit “return” after you type cat.) Now type. After you enter a line, cat will print out the same line. It’s the simplest text filter possible (one rule: let everything through).

When you’re done with cat, press Ctrl+D. Let’s try something more interesting, like grep:

$ grep foo

Now type some lines of text. Try typing, for example, “I like drink” and then “I like food.” The grep command only prints out lines that “match” the string of characters that follow the command (foo, in this case). Let’s try it again, this time with a different “pattern”:

$ grep you

If we cut and paste the poem above (“Sea Rose”) into the terminal application the resulting output would look like this:

you are caught in the drift. you are flung on the sand, you are lifted

The commands head and tail print out a certain number of lines from the beginning of a file and the end of a file, respectively. If you type in the following:

$ tail -3

… and then paste in the poem above, you’d get:

Can the spice-rose drip such acrid fragrance hardened in a leaf?

Structure of UNIX commands

UNIX commands generally follow this structure:

name_of_command [options] arguments

(The “[options]” part of that schema is usually one or more characters preceded by hyphens. The -4 in tail tells it to print the last four lines; grep takes an option, -i, which tells it to be case insensitive.)

You can think of UNIX commands like commands in English, but with a funny syntax: “Fetch thoroughly my slippers!”

You can figure out which options and arguments a command supports by typing man name_of_command at the command line.

Sorting and piping

The sort command takes every line of input and prints them back out, in alphabetical order. Try:

$ sort

… paste in the poem, and hit Ctrl+D. You’ll get something like:






Can the spice-rose 
drip such acrid fragrance 
hardened in a leaf?
in the crisp sand 
marred and with stint of petals, 
meagre flower, thin, 
more precious 
Rose, harsh rose, 
SEA ROSE
single on a stem -- 
spare of leaf,
Stunted, with small leaf, 
than a wet rose 
that drives in the wind.
you are caught in the drift.
you are flung on the sand, 
you are lifted

(why the blank lines at the beginning?)

So far, we’ve just been sending these commands input (by typing, or cutting and pasting), then letting the output be printed back to the screen. UNIX provides a means by which we can send the output of one program as the input of another program. (Kind of like hooking objects up in a Max/MSP patch.) We do this using the pipe character (| … usually shift+backslash). For example:

$ grep leaf | sort

… takes lines from input, displays only those that contain the string “leaf,” and then passes them to sort, which displays those lines in alphabetical order. The output from the poem:

hardened in a leaf? spare of leaf, Stunted, with small leaf,

cut and tr

The cut command breaks up a line of text into its constituent parts. For example:

$ cut -d ' ' -f 1

… prints out the first word of all lines of input. The cut command takes two options, both of which themselves have parameters. The -d option is followed by the “delimiter” string (i.e., what you want to split the line on); the -f option is followed by which field you want. The output:

SEA


Rose,

marred

meagre

spare
more

than

single

you
Stunted,

you

you

in

that

Can drip hardened

The tr command “translates” a set of characters in the original line to another set of characters. The source character set is the first parameter, and the second parameter is the characters you want them to be translated to. For example:

$ tr aeiou eioua hello there, how are you? hillu thiri, huw eri yua?

You can specify a range of characters with a hyphen:

$ tr a-z A-Z hello there, how are you? HELLO THERE, HOW ARE YOU?

Multiple pipes

Of course, you can include more than one command in a “pipeline”:

$ sort | tail -6 | tr aeiou e

… which, if you send it our venerable poem, outputs the following:

Stented, weth smell leef, then e wet rese thet dreves en the wend. yee ere ceeght en the dreft. yee ere fleng en the send, yee ere lefted

What happened? The input went to sort, which sorted the lines in alphabetical order. Then tail -6 grabbed only the last six lines of the output of sort, which sent those lines through to tr. (You can build pipelines of infinite length using this technique.)

Using files (“redirection”)

So far, we’ve been building “programs” that can only read from the keyboard (or from cut-and-paste) and can only send their output to the screen. What if we want to read from an existing file, and then output to a file?

No complicated code is needed. UNIX provides a method for us. It’s called “redirection.” Here’s how it works:

$ sort <sea_rose.txt

The < character means “instead of taking input from the keyboard, take input from this file.” Likewise:

$ grep were >some_file.txt

The > means “instead of sending your output to the screen, send it to this file.” You can use them both at the same time:

$ grep were <sea_rose.txt >some_file.txt

… in which case some_file.txt will end up with every line from this_is_just.txt that contains the string “were.” (If the output file doesn’t already exist, it will be created. If it does already exist, it will be overwritten, so be careful!)

Other helpful commands

wc -w foo will print out the number of words in the file named foo. (wc -l will count the number of lines; wc -c will count the number of characters.)

curl -s http://some.url/ fetches the web page at the given URL and prints its content to standard output.

(We’ll go into more detail with these next week.)

The ls command will give you a list (“ls” stands for “list”) of files in your current directory. If you give it a parameter, it will give you a listing of the files in the directory you gave to it. (On OS X, for example, try ls /Users/your_user_name/Desktop)

Type pwd to find out what your current directory is.

The cp command will make a copy (“cp” for “copy”) of a file. It takes two parameters: the first is the source file name, the second is the destination file name.

Type rm foo to delete the file named foo. (Note: this is permanent! The file won’t go to your Trash… so be careful)

Suggested exercises

This is just to grep. Use a combination of the UNIX commands discussed in class (along with any other commands that you discover) to compose a text. Your “source code” for this exercise will simply consist of what you executed on the command line. Indicate what kind of source text the “program” expects, and give an example of what text it generates. Use man to discover command line options that you might not have known about (grep -i is a good one).

Physical therapy. Take a text on paper–a newspaper, a restaurant menu, a book, whatever–and perform a transformation on it equivalent to the way one of the UNIX text commands we discussed transforms digital text. For example, to grep a book, you might highlight or cut out all of the lines in the book that match a particular string.

Helpful links

UNIX Tutorial for beginners: covers many of the above topics in more detail
egrep for Linguists: in-depth tutorial not just of grep, but of many other useful commands for manipulating text in UNIX

No comments

Comments feed for this article

Trackback link: http://www.decontextualize.com/teaching/rwet/introduction-and-unix-tutorial/trackback/

decontextualize