Excerpt from "I Waded in Clear Water"

Excerpt from “I Waded in Clear Water”

Last November, I participated in National Novel Generation Month (NaNoGenMo), an event in which participants are encouraged to write a computer program that generates a novel. Originally conceptualized by Darius Kazemi as a cheeky alternative to NaNoWriMo, the event has inspired programmers and writers to create some really beautiful work.

My contribution is a procedurally generated novel called I Waded In Clear Water. The primary source text for the novel is Gustavus Hindman Miller’s Ten Thousand Dreams Interpreted, with footnotes provided by information gleaned from ConceptNet and WordNet. You can read more about the process I used to generate the novel, and see the Python source code, at my NaNoGenMo 2014 Github repository.

I read some excerpts from the novel and gave a presentation about it at WordHack on January 15th. Here’s the presentation deck in PDF format.

Tags: , ,


The Infinity

I made a new Twitter bot: @eventuallybot. It generates short, silent films in GIF format, based on randomly-selected snippets of YouTube videos. As of this writing, the bot has generated nearly 300 tiny films!

The code is written in Python and makes heavy use of Connor Mendenhall’s wgif program and ImageMagick. I used my new Python library, My Dinosaur, to generate an RSS feed for the bot (a first for me!), which you can subscribe to here.

I’ve had the idea for this bot for a while. I’ve been interested since my undergraduate linguistics days in the idea of textual cohesion—the methods and strategies that language speakers employ to make the units of the text (lines, sentences, stanzas, paragraphs, etc.) come together as a whole. In particular, I’m interested in how just mimicking the surface forms of cohesion (by, e.g., pronoun substitution, anaphoric/cataphoric demonstratives, or even just lexical repetition in the form of anaphora) can make generative text feel like it’s telling a story, even if the text doesn’t have any kind of underlying semantic model.

With @eventuallybot, I wanted to experiment with some of these concepts. The experiment, specifically, was this: if you take random bits of video, and splice them together with titles that suggest the contour of a story, how often will you get a result that feels at least sort of cohesive?

So I made a big list of transition words—essentially, conjunctions and phrases that function as conjunctions—and (inspired by Labov’s narrative analysis) lightly categorized them like so:

  • beginning phrases (phrases that start a story, like “once upon a time”)
  • “and-then” phrases (phrases that move the story along a bit in time, like “after that”)
  • continuing phrases (phrases that introduce a second situation or complicating factor, like “meanwhile” or “nearby”)
  • concluding phrases (phrases that introduce an explanation of how the story is resolved, like “therefore” or “to summarize…”)
  • ending phrases (like “The End”)

As Mark Sample pointed out on Twitter, filmmakers are already familiar with the “Kuleshov Effect,” which describes how viewers will tend to see two shots juxtaposed in montage as being narratively related. To be sure, the titles in @eventuallybot’s films are a bit less subtle than straight-up cuts between shots. But I kind of enjoy how @eventuallybot (at its most coherent) feels like it’s telling an anecdote with its clips, not just implying a narrative connection among them.

One reason I wanted to have an RSS feed for this bot is Twitter’s support for the .GIF format. Twitter “supports” GIFs, but transparently converts them after upload to a different video format, and (as far as I can tell) throws away the original GIF data. This is probably the right move on Twitter’s part, since GIFs aren’t (byte-for-byte) a very efficient format for storing video, but I wanted people to be able to save and share the GIFs as they were originally generated. So the RSS feed updates at the same time as the bot itself, and it links to the original GIFs.

If it hasn’t already happened by the time you read this, it will happen soon: @everyword‘s seven-year mission to tweet “every word in the English language” has come to an end. I hope you’ve all enjoyed the ride!

My plan is to write a more complete post-mortem on the project later. In the mean time, this post contains some links to things that followers of @everyword might find interesting or useful.

The future of @everyword

But first, a word about what’s next for @everyword. Don’t unfollow just yet! My plan at the moment is to let the account rest for a bit, and then run “@everyword Season 2,” starting over from the beginning of the alphabet. Before I do that, I’d like to find a more thorough word list, and also do some programming work so that the bot is less likely to experience failures that interrupt service.

Writing about @everyword

Here’s some writing about @everyword, by me and others.

Writing about Twitter bots

@everyword is a Twitter bot—an automated agent that makes Twitter posts. There are a lot of interesting Twitter bots out there. Here’s some interesting writing by and about bot-makers:

What to follow

Here are some Twitter bots that I think followers of @everyword might enjoy.

Thank you!

The response to @everyword has been overwhelming. When I started the project in 2007, I never would have dreamed that the account would one day have close to 100k followers. And if you’re one of those followers, thank you! It’s a great feeling to have made something that so many people have decided to make a daily (or, uh, half-hourly) part of their lives.

I view @everyword as a success, and I want to note here that I owe this success to all of my friends and family who encouraged me along the way and helped to make @everyword a topic of conversation. I am very bad at finding value in the things I make, and I’m especially bad at self-promotion. Without the help of the people close to me, I’m sure that @everyword would have completed its task in obscurity—if it completed its task at all.

scrabble sucks screencap

I gave a talk at !!Con a few weeks ago. The talk was called “Scrabble Sucks! Toward higher-order word games.” The talk is about some problems I have with Scrabble, and some of the games I’ve made in response to those problems. Download the slides and notes here. There are a few slides I didn’t get to ub my actual presentation, comparing a sizable corpus of Scrabble games to Lexcavator‘s list of all words that players have ever found, that are included in the PDF above for your perusal.

I had a lot of fun participating in !!Con. I was a little nervous talking right before Mark-Jason Dominus, whom I venerated back in my Perl-slinging days, and whose Higher-Order Perl is what I was riffing off of with the subtitle of my talk (except I wasn’t talking about higher-order functions; I was talking about higher-order n-grams.) But everything worked out okay, and I’m glad I got to give my talk to such an enthusiastic and receptive crowd.

Here’s a list of links to student projects made for Reading and Writing Electronic Text, along with a brief description. I thought all of my students did great work this year, and it was a pleasure to teach them!

Centrality by Aankit Patel. I guess I could describe this as “lexicographical dataglitch-punk.” There is an online version available.

Bing Huang‘s TextFinal is an audiovisual meditation on the Declaration of Independence.

Caitlin Weaver produced Susan Scratched, a poem glitched through with a distinctive kind of repetition. A lovely performance of the poem is available on SoundCloud.

Clara Santamaria Vargas made Gertrude’STime, which takes the phrase “in the morning there is meaning, in the evening there is feeling” from Tender Buttons literally, and generates poems accordingly.

Dan Melancon‘s L SYSTEM POEM SYSTEM POEM L POEM L SYSTEM does what it says in the title: applies a Lindenmayer System to an input text. The output exhibits a strangely hypnotic form of uncanny alien repetition.

Eamon O’Connor’s final project uses the CMU pronouncing dictionary to produce metrical verse. Several examples are included on the page.


Hellyn Teng‘s final project was Kepler-186, a multimedia poem about exoplanets and home economics. Documentation includes sound snippets and screenshots.

Jason Sigal‘s write-up of his final project, The Phrases and Pronunciation is fantastic—he goes into detail about his process and the technical and conceptual decisions that he made along the way.

John Ferrell made a Twitter bot called the Rambling Taxidermist, whose inspiration and inner workings he has written up here. The bot responds to tweets about marriage with ill-advised, mashed-up advice composed partially from a taxidermy handbook.

Michelle Chandra created a lovely poem about loneliness, drawing upon a corpus of well-known quotes. I love the repetition and alliteration in this one, well worth reading.

Ran Mo‘s Birdy News juxtaposes Twitter jokes with NY Times to often humorous effect.


Robert Dionne‘s final project, Reading Between the Lines, generates multidimensional poems from distributed word vectors.

Salem Al-Mansoori‘s final project, @_all_of_us, is a machine for creating random platitudes and aphorisms. It takes the form of a Twitter bot and a generative comic strip.

Sam Lavigne‘s program to Transform any text into a patent application has attracted a lot of press attention, so you might have already seen it! Original and well-executed. I love this project.

Sheri Manson created a series of three poems, based on words and phrases drawn randomly from an interesting selection of source texts.

Uttam Grandhi‘s project The Baptized Pixel uses image data to generate poems with binary-like repetition.

Vicci Ho made a Twitter bot, @onetruewiseman, that combines the social media wisdom of several conservative luminaries.


Friday May 9th, 2014
721 Broadway, New York, NY
Tisch Common Room (ground floor)

Please join us as a semester of experimentation with procedural text culminates in a one-night-only performance of computer-generated poetry. Seventeen students at NYU’s Interactive Telecommunications Program will take the stage and, with their voices, set aloft poems and prose produced by programs of their own design. You are likely to encounter: poems made from pixels, automated propaganda, lexicographical cut-ups, twitter bots, and more.

Reading and Writing Electronic Text is a course offered at NYU’s Interactive
Telecommunications Program. (http://itp.nyu.edu/itp/). The course is an introduction to both the Python programming language and contemporary techniques in electronic literature. See the syllabus and examples of student work here: http://rwet.decontextualize.com/

Poster design by Caitlin Weaver (http://www.phasesofsputnik.com/). Print-quality poster art can be downloaded here.

Odds and ends

Here are a few things I made recently that didn’t make it to my blog.

Pizza Clones


Here’s yet another Twitter bot: Pizza Clones. Every two hours it generates a joke in the form of “Every {NOUN} is a(n) {ADJECTIVE} {NOUN} when/if/as long as {SUBORDINATE-CLAUSE}.” Some sample output:

It’s an attempt to imitate and elaborate on the joke “Every pizza is a personal pizza if you try hard enough and believe in yourself.” That joke in particular is hard to attribute to one person (appearing on Twitter here as early as 2010, more recently here and here), but the general syntactic pattern is found in well-known bits by Mitch Hedberg (“every book is a children’s book if the kid can read”) and Demetri Martin (“every fight is a food fight when you’re a cannibal“).

The bot works by first searching Twitter for tweets containing a phrase in the format “this isn’t a(n) {ADJECTIVE} {NOUN}” and then using a Pattern search to identify and extract the ADJECTIVE and NOUN. It then searches Twitter for phrases that match the string “{NOUN} if” (and “{NOUN} unless”, “{NOUN} as long as”, etc.), and extracts the rest of the clause following the “if.” (There’s some more NLP behind the scenes to ensure that the “if” clause will fit the joke syntax.) Then it jams the NOUN, ADJECTIVE and subordinate clause into the format of the joke and tweets it out to the world. It does this every two hours. Links to the tweets from which the substance of the joke was extracted are included in the body of the tweet, for attribution purposes. The bot keeps a small database of previously used clauses to prevent it from repeating itself too frequently.

I’ll admit that this is a pretty obscure joke, but I’m really happy with the output. The noun—occurring both in the adjective/noun pair and adjacent to the “if” clause—gives the jokes a semantic anchor, but the fact that the text is grabbed from two different tweets (and two different contexts) keeps the jokes surprising and weird. I’m also pleased at how almost all of the tweets feel grammatical, given the limited degree of NLP involved in the procedure. Follow Pizza Clones on Twitter!


I made a Twitter bot called Voynich Tech News. Every few hours it posts a link to a technology news blog along with a random snippet of text from a transcription of the Voynich manuscript, interspersed with randomly extracted noun phrases from the title of the blog post. The intention is to make it seem like the Voynich text is reporting/commenting on/elaborating on the blog post it links to.

I made this weird bot for a number of reasons.

First off, I discovered this transcript of the Voynich manuscript recently and found it fascinating. (Here’s a link to the full, uncompressed text.) The manuscript itself is beautiful, of course, but because the symbols are unfamiliar, it’s hard to get a sense of what the text “feels” like, from a structural and statistical perspective. Looking at the transcript, you instantly understand that the symbols aren’t just random gibberish—there’s a discernable, language-like structure. I wanted to bring the transcript to other people’s attention, and I wanted to do something creative with it so I could understand it better.

Second, and this is a bit of a stretch, I wanted to help decipher the script. The Voynich manuscript is famously undeciphered, and it’s debatable whether it even represents language at all. It could be steganography, or an elaborate cipher, or graphic glossolalia. (Wikipedia has a good overview of the various theories.) Deciphering a script is easier when you find a sample of that script in a new context. Although there’s no “new” text being posted to @VoynichTechNews—it’s all drawn verbatim from the transcript, except for the interjected bits of tech news—my hope is that seeing the text do something new, juxtaposed with something unexpected, might jostle something loose in someone’s brain and bring about an epiphany about how the script is structured, and what it might mean.

Oh, and I wanted to have some fun at the expense of people who obsess over technology news on Twitter.

Technical notes: I used TextBlob to extract “noun phrases” from blog post titles. The parsing is imperfect so sometimes it looks like it’s just “random substrings” rather than “noun phrases,” but that serves the aesthetic just as well. My procedure for extracting text from the Voynich transcript was to take the first complete transcription of each line, strip whitespace (marked as ‘-’ and ‘=’ etc. in the transcript), and put it into a big text file. To compose the tweet, I choose a random stretch of words from that file. The words in the tweet therefore might consist of stretches of text that might not be contiguous in the actual manuscript (if, for example, two lines next to each other in my text file are actually found on different pages in the MS).

Zzt ebooks

ANSI twitter egg surrounded by yellow border

I made another twitter bot! This one’s called Zzt ebooks. Several times a day, it posts a tiny snippet of text drawn from a corpus of all games from the archive of ZZT games on zzt.org.

ZZT, in case you don’t know, is a game creation system for DOS dating back to the 1990s. I basically spent all my teen years making ZZT games and hanging out (online) with people—mostly teenagers—who made ZZT games. During its heyday, hundreds of new ZZT games were released every year, and a culture and language of conventions developed surrounding how ZZT games were made and distributed. It was weird and charming and angst-ridden and amazing all at once. Anna Anthropy is writing a book about ZZT that I am very excited to read.

ZZT includes a simple programming language for scripting in-game objects. Programs you write in this language are stored in the game file itself as plaintext. One feature of this scripting language is that if, in the course of running a program, the interpreter finds any text that isn’t a programming language construct, that text will appear on the screen, usually in a modal dialog that allows you to read at your own pace. This text is used for exposition, dialog, room descriptions, etc. The corpus for the bot comes from extracting all such non-program text from a (relatively) recent scrape of all the ZZT games from zzt.org.

To generate the text that the bot tweets, I randomly selected a line of text from the corpus, then appended each succeeding line in the corpus, up until I reached 140 characters. (There’s a random chance that the algorithm will stop after each line, to give the tweets some variation in length.) No Markov chains involved, which is rare for me! The tweets you’ll read amount to tiny text-dumps of ZZT games, clipped to resemble the style of Horse ebooks.

zzt ebooks screencap

My program for extracting program text from ZZT games is imperfect, and it sometimes leaves in non-program garbage text, so I combed through the initial tweet list and removed any that had special characters or weird formatting. I also removed anything that seemed grossly sexist, homophobic, or otherwise violent. (The ZZT scene was always predominately populated by adolescent boys, so there’s a surprising amount of this.) And I removed anything that seemed to reference real, individual people—some ZZT games were made about the ZZT scene itself, and those games sometimes got a little mean (to my eye, at least). I don’t want to be in the business of digging up and re-airing anything hurtful.

I have mixed feelings about whitelisting tweets in this manner, but in the end I decided that I’d rather make something whimsical and nostalgic, rather than (what would have amounted to) an often-offensive satire of teenage behavior. I ended up with several thousand tweets, which should be enough to keep the bot going for a year or two.

If you’re an author of a ZZT game and you see text from your game included in Zzt ebooks that you feel has been misappropriated, let me know and I’ll remove it immediately. Otherwise, enjoy!

« Older entries