Introduction and UNIX tutorial

The structure of non-interactive computer programs

Computer programs are made to work in many different ways. To simplify a bit, let’s say that there are two kinds of computer programs: interactive programs and non-interactive programs. If you learned to program in an environment like Processing, then you probably learned interactive programming. Programs made in this paradigm can be schematically represented like this:

Do some initialization stuff.
Keep doing something over and over again.
Get some user input.
Go to step 2.

These programs are centered around responding to user actions. But what if we don’t care about that? What if we just want to munge some data?

Most computer programs are non-interactive, meaning that they don’t respond to user input (or respond to it only indirectly). The model for the programs that we’ll be making in this class is this:

input -> mungeing procedure -> output

We don’t care about how our program was run, or what the user’s doing while the program is running, or even (ideally) where the input is coming from or where the output is going. We’re concerned solely with the procedure: the code that, given data as input, transforms it into output.

Appropriative Poetics: Cut-ups

From a poetic standpoint, we’re doing “appropriative process writing”: writing algorithms that take existing texts and mess them up, recombining them with themselves or other texts, juxtaposing their elements in unexpected (or completely expected) ways. Much of the conceptual material of the course focuses on this idea.

Units: Character, Line, File

Our process, however, needs a unit to operate on. Text can be divided into any number of different, overlapping units (document, page, section, subsection, chapter, clause, sentence, ascender, descender, act, stanza, syllable, foot…) but only some of these are easy for computers to work with. (It’s harder than you think to teach a computer what a “sentence” is, for example.)

The two most obvious units of text in a computer are:

the character, i.e., the byte (or series of bytes) that represents a single element of written language (e.g., A through Z in English, any one of many glyphs in Chinese…)
the file, i.e., an ordered collection of characters

Somewhere in between these two is the line, a formal unit of text that has been part of written language from the beginning. (Cuneiform with lines.) The line arises in written text because writing transcribes speech, which is a one-dimensional medium, onto two-dimensional surfaces (paper, clay, stone…). Line breaks in text are, fundamentally, a way of using up all of the space allotted.

But line breaks also serve syntactic, semantic, and metrical functions, as in poetry:

this is just to say

i have eaten
the plums
that were in
the icebox

and which
you were probably
saving
for breakfast

forgive me
they were delicious
so sweet
and so cold

In computer text, the line is often used as a “record marker.” This is how a text file can be used as a rudimentary database. (Take these NBA stats, for example.) When your Arduino sends a new line character after a group of data, that’s exactly what it’s doing.

Perhaps because of these parallelisms (text layout/poetic structure/database structure), many programs that operate on text use the line as their fundamental unit–especially those in UNIX (coming right up). The programs that we write in this class will do the same.

The UNIX Command Line

In this class, we’ll be doing most of our work from the UNIX command line. It’s not because we’re hard-asses, or bad-asses, or just asses. Some claim that the command line is “better,” or “more fundamental.” It isn’t. But it was built to do the kind of work that we’re doing in this class.

Getting started: Logging In

If you’re already familiar with UNIX and have access to a UNIX machine, then you’re good to go, and you can skip to the next section.

For the duration of this course, there is a UNIX server (well, Ubuntu Linux) available online for student use. In order to log into it, though, you’ll need an SSH client. (SSH means “secure shell”: it’s a protocol that allows you to log in to remote machines with secure data encryption.)

If you’re using OS X, then you’ve already got a good SSH client. Open Terminal (Applications > Utilities > Terminal); at the prompt, type the following:

ssh your_user_name@sandbox.decontextualize.com

(Replace your_user_name with the first letter of your first name followed by the first seven letters of your last name—for me, that would be aparrish: the “a” from Adam plus all seven letters of my last name.)

You’ll be prompted for a password, which I gave in class. Come see me if you missed it, or if you forgot your password when you changed it in class.

If you’re using Windows, you’ll need to download an SSH client. I recommend PuTTY. (See me if you need more detailed instructions for how to set this up.)

If you’re using Linux, you probably already know the drill. Open your terminal emulator and SSH to the server, using the same command for OS X given above.

When you’ve successfully reached the command line (another line!), you should see something like this:

aparrish@li99-59:~$

This is the “prompt” (because it “prompts” you to do something). It’s telling you your username, the server you’re logged into, and the current directory. More on that later.

The two keystrokes you absolutely must know:

Ctrl+D: “I’m done typing things in. kthxbye.”
Ctrl+C: “You’re doing something I don’t like. Please stop.”

Your first UNIX commands

First off, we’re going to create a directory, so that you can find it later and you don’t risk overwriting something:

$ mkdir dwwp $ cd dwwp

(don’t type the $! That’s just there to indicate that those commands should be typed at the command line.)

The mkdir command means “make directory”–”directory” is just UNIX speak for “folder.” (You can probably find the folder in the Finder right now, if you’re on OS X.) When you’re using the command line, there’s one directory on your machine that is considered your “current” directory, i.e., the directory you’re doing stuff in. The cd (“change directory”) command makes the directory you give it (dwwp in this case) the current directory.

There are (broadly) two kinds of commands in UNIX: commands that work on lines of input/output, and commands that operate on files and directories. The mkdir and cd commands are examples of the latter. We’re primarily concerned with the former. Let’s start with cat:

$ cat

(Make sure to hit “return” after you type cat.) Now type. After you enter a line, cat will print out the same line. It’s the simplest text filter possible (one rule: let everything through).

When you’re done with cat, press Ctrl+D. Let’s try something more interesting, like grep:

$ grep foo

Now type some lines of text. Try typing, for example, “I like drink” and then “I like food.” The grep command only prints out lines that “match” the string of characters that follow the command (foo, in this case). Let’s try it again, this time with a different “pattern”:

$ grep were

If we cut and paste the poem above (“This is just to say”) into the terminal application the resulting output would look like this:

that were in you were probably they were delicious

The commands head and tail print out a certain number of lines from the beginning of a file and the end of a file, respectively. If you type in the following:

$ tail -4

… and then paste in the poem above, you’d get:

forgive me they were delicious so sweet and so cold

Structure of UNIX commands

UNIX commands generally follow this structure:

name_of_command [options] arguments

(The “[options]” part of that schema is usually one or more characters preceded by hyphens. The -4 in tail tells it to print the last four lines; grep takes an option, -i, which tells it to be case insensitive.)

You can think of UNIX commands like commands in English, but with a funny syntax: “Fetch thoroughly my slippers!”

You can figure out which options and arguments a command supports by typing man name_of_command at the command line.

Sorting and piping

The sort command takes every line of input and prints them back out, in alphabetical order. Try:

$ sort

… paste in the poem, and hit Ctrl+D. You’ll get something like:

and so cold and which for breakfast forgive me i have eaten saving so sweet that were in the icebox the plums they were delicious this is just to say you were probably

(why the blank lines at the beginning?)

So far, we’ve just been sending these commands input (by typing, or cutting and pasting), then letting the output be printed back to the screen. UNIX provides a means by which we can send the output of one program as the input of another program. (Kind of like hooking objects up in a Max/MSP patch.) We do this using the pipe character (| … usually shift+backslash). For example:

$ grep were | sort

… takes lines from input, displays only those that contain the string “were,” and then passes them to sort, which displays those lines in alphabetical order. The output from the poem:

that were in they were delicious you were probably

cut and tr

The cut command breaks up a line of text into its constituent parts. For example:

$ cut -d ' ' -f 1

… prints out the first word of all lines of input. The cut command takes two options, both of which themselves have parameters. The -d option is followed by the “delimiter” string (i.e., what you want to split the line on); the -f option is followed by which field you want.

The tr command “translates” a set of characters in the original line to another set of characters. The source character set is the first parameter, and the second parameter is the characters you want them to be translated to. For example:

$ tr aeiou eioua hello there, how are you? hillu thiri, huw eri yua?

You can specify a range of characters with a hyphen:

$ tr a-z A-Z hello there, how are you? HELLO THERE, HOW ARE YOU?

Multiple pipes

Of course, you can include more than one command in a “pipeline”:

$ sort | tail -6 | tr aeiou e

… which, if you send it our venerable poem, outputs the following:

thet were en the ecebex the plems they were deleceees thes es jest te sey yee were prebebly

What happened? The input went to sort, which sorted the lines in alphabetical order. Then tail -6 grabbed only the last six lines of the output of sort, which sent those lines through to tr. (You can build pipelines of infinite length using this technique.)

Using files (“redirection”)

So far, we’ve been building “programs” that can only read from the keyboard (or from cut-and-paste) and can only send their output to the screen. What if we want to read from an existing file, and then output to a file?

No complicated code is needed. UNIX provides a method for us. It’s called “redirection.” Here’s how it works:

$ sort <this_is_just.txt

The < character means “instead of taking input from the keyboard, take input from this file.” Likewise:

$ grep were >some_file.txt

The > means “instead of sending your output to the screen, send it to this file.” You can use them both at the same time:

$ grep were <this_is_just.txt >some_file.txt

… in which case some_file.txt will end up with every line from this_is_just.txt that contains the string “were.” (If the output file doesn’t already exist, it will be created. If it does already exist, it will be overwritten, so be careful!)

Other helpful commands

wc -w foo will print out the number of words in the file named foo. (wc -l will count the number of lines; wc -c will count the number of characters.)

curl -O http://some.url/ downloads the file at the given URL into your current directory.

(We’ll go into more detail with these next week.)

The ls command will give you a list (“ls” stands for “list”) of files in your current directory. If you give it a parameter, it will give you a listing of the files in the directory you gave to it. (On OS X, for example, try ls /Users/your_user_name/Desktop)

Type pwd to find out what your current directory is.

The cp command will make a copy (“cp” for “copy”) of a file. It takes two parameters: the first is the source file name, the second is the destination file name.

Type rm foo to delete the file named foo. (Note: this is permanent! The file won’t go to your Trash… so be careful)

Helpful links

UNIX Tutorial for beginners: covers many of the above topics in more detail
egrep for Linguists: in-depth tutorial not just of grep, but of many other useful commands for manipulating text in UNIX

Reading for next week

We’ll discuss this reading on July 6th. Questions to guide your reading: How can a computer program demonstrate a point of view? What points of view are behind the grep command? What do you think about Glazier’s “grep works”? What’s more important to the outcome, the procedure or the source text? How to UNIX utilities rely on or reinforce the substance of digital text (ASCII, lines, etc.)?

Friedman and Nissenbaum. Bias in Computer Systems.
Glazier, L.P. Grep: A Grammar. In Witz 7.1 (Spring 1999). Examples of Glazier’s “grep works” can be found here.
An Annotated History of Character Codes
(optional) Rubenstein, R. Gathered, Not Made: A Brief History of Appropriative Writing

No comments

Comments feed for this article

Trackback link: http://www.decontextualize.com/teaching/dwwp/introduction-and-unix-tutorial/trackback/

decontextualize