Python and CGI

Python and the web: CGI

The simplest way to make your Python script available on the web is to use CGI (“common gateway interface”), a basic protocol that a web server can use to send input to a program on the server and get output from it. CGI is an aging standard, and has largely been replaced (especially for Python web applications) by more sophisticated techniques (like WSGI). Still, for simple cases, CGI is the way to go: it’s easy to understand and requires no external software and little server configuration. If you’re interested in serious web applications, though, you might look into some of the frameworks listed below.

Further reading:

Requirements: server setup and code

In order to run Python programs as CGI scripts, your server must be set up correctly. Instructions for doing this are largely beyond the scope of this tutorial (if you’re using shared hosting, contact your help desk). In Apache, the following two directives, if placed in a <Directory> block or .htaccess file, should get things going:

Options ExecCGI
AddHandler cgi-script .py

These lines tell Apache to view .py files as CGI scripts. When a client (such as a web browser) requests a file with a .py extension, Python will run the program and return its output.

Your program must meet the following requirements:

  • The first line of the program must be #!/usr/bin/python (or whatever the path to Python is on your server); without this line, Apache won’t know how to run the program, and you’ll get an error.
  • The permissions on the file must be correct; run chmod a+x your_script.py on the command line, or use your SFTP client to set the permissions such that all users can execute the program.
  • Before your program produces any other output, it must print Content-Type: text/html followed by two newlines (e.g. print "Content-Type: text/html\n";). Without these two lines, Apache and the client’s web browser won’t be able to interpret your program’s output. (Other content types are valid, such as text/plain or application/xml, depending on what you’re program is outputting. Here’s a list of valid content types.)

Your program must not:

  • Use any user-supplied data to operate on the server’s file system. Providing a field that (e.g.) would allow the user to open the file named by the field is a bad idea: it opens the door to security problems. If you must use user-supplied data in this manner, carefully validate and filter it first.
  • Display any user input to output, without quoting it first. Displaying unfiltered user-supplied data is an easy way to facilitate cross-site scripting attacks (more information). Always use the cgi module’s escape method to convert HTML in the user’s input to plain text.
  • Write to files on the server. If your application is sophisticated enough to require storing user data, you’d be better off using a database, which will provide better structure, performance and security than storing data in flat files.

The simplest possible CGI script

This script (test.py) simply outputs some HTML.

#!/usr/bin/python
import cgi

print "Content-type: text/html"
print ""

print "<h1>You are making text</h1>"
print "<p>This is a paragraph tag!</p>"
print "<ul>"
for ch in "this is a test":
	print "<li>" + ch + "</li>"
print "</ul>"

A slightly more sophisticated example

A random sonnet script. Available here.

#!/usr/bin/python

import cgi
import cgitb
cgitb.enable()
import markov

mark = markov.MarkovGenerator(3, 80)
for line in open("/home/aparrish/texts/sonnets.txt"):
  mark.feed(line.strip())

print """Content-Type: text/html

<html>
<head>
  <title>Test</title>
</head>
<body>

<h1>A Sonnet.</h1>
"""

for i in range(14):
  print mark.generate()
  print "<br/>"

print """</body>
</html>"""

You’ll notice there’s nothing particularly special about this program: it uses the MarkovGenerator class to generate markov-chain text from a given file, then prints it out. There’s a small amount of HTML surrounding the output, and the required Content-Type line, but otherwise nothing out of the ordinary. We use the cgi module, but only to use cgitb.

Using forms

Getting data back from the user is moderately more complicated. This program allows the user to provide the source text for the Markov chain and specify the order of the n-gram and the number of lines to generate. It does so using the cgi module’s FieldStorage method, which returns a dictionary-like object with whatever information the user submitted to the program.

If there’s been any information submitted to the script, either using POST or GET (from, e.g., a form submission), then the program displays the Markov chain output. Otherwise, it displays a form to solicit this information:

#!/usr/bin/python

import cgi
import cgitb
cgitb.enable()
import markov

print """Content-Type: text/html

<html>
<head>
  <title>Test</title>
</head>
<body>

<h1>Markov Chain: BYOT.</h1>
"""

# if user supplied input...
form = cgi.FieldStorage()
if len(form) > 0:
  n = int(form.getfirst('n'))
  linecount = int(form.getfirst('linecount'))
  mark = markov.MarkovGenerator(n, 255)
  for line in form.getfirst('sourcetext').split("\n"):
    mark.feed(line)
  for i in range(linecount):
    print mark.generate()
    print "<br/>"

# if no input supplied
else:
  print """
<form method="POST">
  N-gram order: <input type="text" name="n"/><br/>
  Lines to generate: <input type="text" name="linecount"/><br/>
  Source text:<br/>
  <textarea rows=15 cols=65 name="sourcetext"></textarea><br/>
  <input type="submit" value="Generate!"/>
</form>"""

print """</body>
</html>"""

When the user visits this program initially, it simply displays the HTML form (the code after the else). If the user clicks the “submit” button, then the web browser sends the information in the form to the program and runs it again. In this case, the cgi.FieldStorage() function returns an object that gives us access to the values that the user submitted. The getfirst method will return whatever the user input into the form field with the corresponding name (e.g., form.getfirst("linecount") will return the value that the user typed in the field whose name attribute is linecount).

An even simpler example using forms

The simplest possible program for processing user input. You could consider this the CGI equivalent of cat: all it does is take whatever the user input from a form and return it to the browser.

#!/usr/bin/python

import cgi

print "Content-type: text/html"
print ""

form = cgi.FieldStorage()
if len(form) > 0:
	stuff = form.getfirst('stuff')
	safe_stuff = cgi.escape(stuff)
	print "<p>" + safe_stuff + "</p>"
else:
	print '<form method="POST"><input type="text" name="stuff"><input type="submit"></form>'

Note here the use of cgi.escape(), a function in the cgi module that transforms special characters in HTML (like angle brackets) to their HTML entity equivalents, thereby making it more difficult to enact cross-site scripting attacks. (Read more about cross-site scripting attacks.)

Reply