Odds and ends

Here are a few things I made recently that didn’t make it to my blog.

Pizza Clones


Here’s yet another Twitter bot: Pizza Clones. Every two hours it generates a joke in the form of “Every {NOUN} is a(n) {ADJECTIVE} {NOUN} when/if/as long as {SUBORDINATE-CLAUSE}.” Some sample output:

It’s an attempt to imitate and elaborate on the joke “Every pizza is a personal pizza if you try hard enough and believe in yourself.” That joke in particular is hard to attribute to one person (appearing on Twitter here as early as 2010, more recently here and here), but the general syntactic pattern is found in well-known bits by Mitch Hedberg (“every book is a children’s book if the kid can read”) and Demetri Martin (“every fight is a food fight when you’re a cannibal“).

The bot works by first searching Twitter for tweets containing a phrase in the format “this isn’t a(n) {ADJECTIVE} {NOUN}” and then using a Pattern search to identify and extract the ADJECTIVE and NOUN. It then searches Twitter for phrases that match the string “{NOUN} if” (and “{NOUN} unless”, “{NOUN} as long as”, etc.), and extracts the rest of the clause following the “if.” (There’s some more NLP behind the scenes to ensure that the “if” clause will fit the joke syntax.) Then it jams the NOUN, ADJECTIVE and subordinate clause into the format of the joke and tweets it out to the world. It does this every two hours. Links to the tweets from which the substance of the joke was extracted are included in the body of the tweet, for attribution purposes. The bot keeps a small database of previously used clauses to prevent it from repeating itself too frequently.

I’ll admit that this is a pretty obscure joke, but I’m really happy with the output. The noun—occurring both in the adjective/noun pair and adjacent to the “if” clause—gives the jokes a semantic anchor, but the fact that the text is grabbed from two different tweets (and two different contexts) keeps the jokes surprising and weird. I’m also pleased at how almost all of the tweets feel grammatical, given the limited degree of NLP involved in the procedure. Follow Pizza Clones on Twitter!

Voynich Tech News


I made a Twitter bot called Voynich Tech News. Every few hours it posts a link to a technology news blog along with a random snippet of text from a transcription of the Voynich manuscript, interspersed with randomly extracted noun phrases from the title of the blog post. The intention is to make it seem like the Voynich text is reporting/commenting on/elaborating on the blog post it links to.

I made this weird bot for a number of reasons.

First off, I discovered this transcript of the Voynich manuscript recently and found it fascinating. (Here’s a link to the full, uncompressed text.) The manuscript itself is beautiful, of course, but because the symbols are unfamiliar, it’s hard to get a sense of what the text “feels” like, from a structural and statistical perspective. Looking at the transcript, you instantly understand that the symbols aren’t just random gibberish—there’s a discernable, language-like structure. I wanted to bring the transcript to other people’s attention, and I wanted to do something creative with it so I could understand it better.

Second, and this is a bit of a stretch, I wanted to help decipher the script. The Voynich manuscript is famously undeciphered, and it’s debatable whether it even represents language at all. It could be steganography, or an elaborate cipher, or graphic glossolalia. (Wikipedia has a good overview of the various theories.) Deciphering a script is easier when you find a sample of that script in a new context. Although there’s no “new” text being posted to @VoynichTechNews—it’s all drawn verbatim from the transcript, except for the interjected bits of tech news—my hope is that seeing the text do something new, juxtaposed with something unexpected, might jostle something loose in someone’s brain and bring about an epiphany about how the script is structured, and what it might mean.

Oh, and I wanted to have some fun at the expense of people who obsess over technology news on Twitter.

Technical notes: I used TextBlob to extract “noun phrases” from blog post titles. The parsing is imperfect so sometimes it looks like it’s just “random substrings” rather than “noun phrases,” but that serves the aesthetic just as well. My procedure for extracting text from the Voynich transcript was to take the first complete transcription of each line, strip whitespace (marked as ‘-’ and ‘=’ etc. in the transcript), and put it into a big text file. To compose the tweet, I choose a random stretch of words from that file. The words in the tweet therefore might consist of stretches of text that might not be contiguous in the actual manuscript (if, for example, two lines next to each other in my text file are actually found on different pages in the MS).

Zzt ebooks

ANSI twitter egg surrounded by yellow border

I made another twitter bot! This one’s called Zzt ebooks. Several times a day, it posts a tiny snippet of text drawn from a corpus of all games from the archive of ZZT games on zzt.org.

ZZT, in case you don’t know, is a game creation system for DOS dating back to the 1990s. I basically spent all my teen years making ZZT games and hanging out (online) with people—mostly teenagers—who made ZZT games. During its heyday, hundreds of new ZZT games were released every year, and a culture and language of conventions developed surrounding how ZZT games were made and distributed. It was weird and charming and angst-ridden and amazing all at once. Anna Anthropy is writing a book about ZZT that I am very excited to read.

ZZT includes a simple programming language for scripting in-game objects. Programs you write in this language are stored in the game file itself as plaintext. One feature of this scripting language is that if, in the course of running a program, the interpreter finds any text that isn’t a programming language construct, that text will appear on the screen, usually in a modal dialog that allows you to read at your own pace. This text is used for exposition, dialog, room descriptions, etc. The corpus for the bot comes from extracting all such non-program text from a (relatively) recent scrape of all the ZZT games from zzt.org.

To generate the text that the bot tweets, I randomly selected a line of text from the corpus, then appended each succeeding line in the corpus, up until I reached 140 characters. (There’s a random chance that the algorithm will stop after each line, to give the tweets some variation in length.) No Markov chains involved, which is rare for me! The tweets you’ll read amount to tiny text-dumps of ZZT games, clipped to resemble the style of Horse ebooks.

zzt ebooks screencap

My program for extracting program text from ZZT games is imperfect, and it sometimes leaves in non-program garbage text, so I combed through the initial tweet list and removed any that had special characters or weird formatting. I also removed anything that seemed grossly sexist, homophobic, or otherwise violent. (The ZZT scene was always predominately populated by adolescent boys, so there’s a surprising amount of this.) And I removed anything that seemed to reference real, individual people—some ZZT games were made about the ZZT scene itself, and those games sometimes got a little mean (to my eye, at least). I don’t want to be in the business of digging up and re-airing anything hurtful.

I have mixed feelings about whitelisting tweets in this manner, but in the end I decided that I’d rather make something whimsical and nostalgic, rather than (what would have amounted to) an often-offensive satire of teenage behavior. I ended up with several thousand tweets, which should be enough to keep the bot going for a year or two.

If you’re an author of a ZZT game and you see text from your game included in Zzt ebooks that you feel has been misappropriated, let me know and I’ll remove it immediately. Otherwise, enjoy!

generate this


Thursday, May 9th, 2013
721 Broadway, New York, NY
Ground floor (Common room)

On this evening, fifteen students of NYU’s Interactive Telecommunications Program will read aloud their experiments in generative and procedural electronic text. These experiments, built using the Python programming language, have been brewing and bubbling for the duration of the semester. Examples of what you may encounter: the Bible meeting the Kama Sutra, while the New York Times site meets its own comments. Markov chains of many sizes and varieties; otherworldly haiku; accidental hiphop; attempts to pronounce the unpronounceable. The corpus meets the body. One night only!

Reading and Writing Electronic Text is a course offered at NYU’s Interactive
Telecommunication Program. (http://itp.nyu.edu/itp/). The course is an introduction to both the Python programming language and contemporary techniques in electronic literature. See the syllabus and examples of student work here: http://rwet.decontextualize.com/

Poster design by Hiye Shin.

Recent movement

This is just a quick notice that I’ve moved around/updated a few archival items on the site.

  • Filthy Ditty now lives on decontextualize.com. It was previously hosted on Posterous, which is shutting down on April 30th 2013. Filthy Ditty is an archive of a “poem-a-day” effort I completed in April 2011.
  • The syllabus and notes for Expressive Computing are now a static HTML site (at the same URL as before). Previously these files had been hosted on a poorly secured MediaWiki installation. Expressive Computing was a course I taught at Hunter College in 2008 (the first university-level course I ever taught!).

I made a new Twitter bot. It’s called @PowerVocabTweet. Here’s a screenshot:


Every few hours, the bot posts a randomly generated word along a randomly generated definition for that word. It’s a procedural exploration in a genre I like to call “speculative lexicography”—basically, @everyword‘s dada cousin.

On the surface, Power Vocab Tweet is a parody of “word-of-the-day” blogs and Twitter accounts. My real inspiration, though, comes from the novel Native Tongue by Suzette Haden Elgin. In that book, a group of underground linguists invent a language (Láadan) that “encodes” in its lexicon concepts that aren’t otherwise assigned to words in human languages. Elgin’s contention is that the manner in which a language “chunks” the universe of human perception into words reflects and reinforces structures of power; therefore, to break the world up into words differently is a means of counteracting the status quo. (In a much less rigorous way, I attempted to explore similar issues in my design of Pey Shkoy—except in the systems of grammatical roles and thematic relations, instead of in the lexicon.)

Elgin’s protagonists (and Elgin herself, in the design of the Láadan language) did this with laser focus; Power Vocab Tweet does it in a more scattershot way. It claims random (and potentially nonsensical) patches of semantic space (which may already be partially claimed), assigns words to them, and hopes that something sticks. Try using one or two Power Vocab Tweet words in a sentence. It’s fun!

The definitions are generated via Markov chain from the definition database in WordNet. The words themselves are generated from a simple “portmanteau” algorithm; each word is a combination of two “real” English words of the appropriate part of speech. (The forms of the words and text used to generate the associated definition aren’t related.) A previous version of this code was featured on Filthy Ditty.

Update: PVT on MetaFilter, New York Review of Bots.

GIF and circumstance

Only nine months too late to truly capture the zeitgeist, I made an automated parody of #whatshouldwecallme-style tumblrs. It’s called GIF and Circumstance. (Warning: potentially NSFW.)

A “#whatshouldwecallme-style tumblr” is one in which animated GIFs are paired with a title expressing a circumstance or mood—usually a clause beginning with “when.” I wrote a Python script to make these kinds of posts automatically. Here’s what it does:

(1) Search Twitter for tweets containing the word “when.”
(2) Extract the “when” clause from such tweets.
(3) Use Pattern to identify “when” clauses with suitable syntax (i.e., clauses in which a subject directly follows “when”; plus some other heuristic fudging)
(4) Post the “when” clause as the title of a tumblr post, along with an animated GIF randomly chosen from the imgur gallery.

The results range from nonsensical to eerily appropriate. Not bad for a weekend hack.

UPDATE: GIF and Circumstance on Metafilter!

Lexcavator is an experimental-ish retro arcade/word game that I’ve been working on since last March, and it’s finally ready for prime time. I’m really excited for people to play! Download it here. (Pay what you want, even if you want to pay $0.)

Some notes about the game:

  • It’s programmed entirely in Python, using processing.py.
  • I use Markov chains to keep related sequences of letters adjacent in the game board. The goal is to make a faster-paced word game where it’s easy to find meaty words.
  • The global leaderboard is completely anonymous (I didn’t want to deal with user authentication), but it does a few things that not many other games do. First, you’re given a percentile rank for each score, which provides a better explanation of how you’re doing in comparison to other players than the global high score or a ranking alone. Second, after each game, you get a list of words that had never been found by any other player before (example from @robdubbin).

For more announcements about Lexcavator, follow @lexcavator on Twitter.

The home-grown chiptune soundtrack is available from Bandcamp. I’ve embedded the title screen track below.



An evening of poetry, performance, and experimental text design from NYU/ITP’s Reading and Writing Electronic Text

Friday, May 4th, 2012
721 Broadway, New York, NY
Ground floor (Common room)

Over the course of Spring semester, eighteen NYU students have engaged in intense electro-textual experiments: composing, mangling, generating and remixing electronic text using the Python programming language. For one night only, these students will gather to present and perform their experiments to the general public.

What to expect: innovative poetic forms, bizarre textual interfaces, generative satire, advanced natural language processing techniques, and more!

Reading and Writing Electronic Text is a course offered at NYU’s Interactive
Telecommunication Program. (http://itp.nyu.edu/itp/). The course is an introduction to both the Python programming language and contemporary techniques in electronic literature. See the syllabus and examples of student work here: http://rwet.decontextualize.com/

Poster design by Inessah Selditz. (Download the full-size version here.)

« Older entries