Python and the web: CGI
The simplest way to make your Python script available on the web is to use CGI (“common gateway interface”), a basic protocol that a web server can use to send input to a program on the server and get output from it. CGI is an aging standard, and has largely been replaced (especially for Python web applications) by more sophisticated techniques (like WSGI). Still, for simple cases, CGI is the way to go: it’s easy to understand and requires no external software and little server configuration. If you’re interested in serious web applications, though, you might look into some of the frameworks listed below.
- CodePoint Web Python Tutorial has good examples for CGI and WSGI.
- Django, web.py and Pylons are all good frameworks for building web applications with Python.
- You might also be interested in Google App Engine.
Requirements: server setup and code
In order to run Python programs as CGI scripts, your server must be set up correctly. Instructions for doing this are largely beyond the scope of this tutorial (if you’re using shared hosting, contact your help desk). In Apache, the following two directives, if placed in a <Directory> block or .htaccess file, should get things going:
Options ExecCGI AddHandler cgi-script .py
These lines tell Apache to view .py files as CGI scripts. When a client (such as a web browser) requests a file with a .py extension, Python will run the program and return its output.
Your program must meet the following requirements:
- The first line of the program must be #!/usr/bin/python (or whatever the path to Python is on your server); without this line, Apache won’t know how to run the program, and you’ll get an error.
- The permissions on the file must be correct; run chmod a+x your_script.py on the command line, or use your SFTP client to set the permissions such that all users can execute the program.
- Before your program produces any other output, it must print Content-Type: text/html followed by two newlines (e.g. print "Content-Type: text/html\n";). Without these two lines, Apache and the client’s web browser won’t be able to interpret your program’s output. (Other content types are valid, such as text/plain or application/xml, depending on what you’re program is outputting. Here’s a list of valid content types.)
Your program must not:
- Use any user-supplied data to operate on the server’s file system. Providing a field that (e.g.) would allow the user to open the file named by the field is a bad idea: it opens the door to security problems. If you must use user-supplied data in this manner, carefully validate and filter it first.
- Display any user input to output, without quoting it first. Displaying unfiltered user-supplied data is an easy way to facilitate cross-site scripting attacks (more information). Always use the cgi module’s escape method to convert HTML in the user’s input to plain text.
- Write to files on the server. If your application is sophisticated enough to require storing user data, you’d be better off using a database, which will provide better structure, performance and security than storing data in flat files.
The simplest possible CGI script
This script (test.py) simply outputs some HTML.
#!/usr/bin/python import cgi print "Content-type: text/html" print "" print "<h1>You are making text</h1>" print "<p>This is a paragraph tag!</p>" print "<ul>" for ch in "this is a test": print "<li>" + ch + "</li>" print "</ul>"
A slightly more sophisticated example
A random sonnet script. Available here.
#!/usr/bin/python import cgi import cgitb cgitb.enable() import markov mark = markov.MarkovGenerator(3, 80) for line in open("/home/aparrish/texts/sonnets.txt"): mark.feed(line.strip()) print """Content-Type: text/html <html> <head> <title>Test</title> </head> <body> <h1>A Sonnet.</h1> """ for i in range(14): print mark.generate() print "<br/>" print """</body> </html>"""
You’ll notice there’s nothing particularly special about this program: it uses the MarkovGenerator class to generate markov-chain text from a given file, then prints it out. There’s a small amount of HTML surrounding the output, and the required Content-Type line, but otherwise nothing out of the ordinary. We use the cgi module, but only to use cgitb.
Getting data back from the user is moderately more complicated. This program allows the user to provide the source text for the Markov chain and specify the order of the n-gram and the number of lines to generate. It does so using the cgi module’s FieldStorage method, which returns a dictionary-like object with whatever information the user submitted to the program.
If there’s been any information submitted to the script, either using POST or GET (from, e.g., a form submission), then the program displays the Markov chain output. Otherwise, it displays a form to solicit this information:
#!/usr/bin/python import cgi import cgitb cgitb.enable() import markov print """Content-Type: text/html <html> <head> <title>Test</title> </head> <body> <h1>Markov Chain: BYOT.</h1> """ # if user supplied input... form = cgi.FieldStorage() if len(form) > 0: n = int(form.getfirst('n')) linecount = int(form.getfirst('linecount')) mark = markov.MarkovGenerator(n, 255) for line in form.getfirst('sourcetext').split("\n"): mark.feed(line) for i in range(linecount): print mark.generate() print "<br/>" # if no input supplied else: print """ <form method="POST"> N-gram order: <input type="text" name="n"/><br/> Lines to generate: <input type="text" name="linecount"/><br/> Source text:<br/> <textarea rows=15 cols=65 name="sourcetext"></textarea><br/> <input type="submit" value="Generate!"/> </form>""" print """</body> </html>"""
When the user visits this program initially, it simply displays the HTML form (the code after the
else). If the user clicks the “submit” button, then the web browser sends the information in the form to the program and runs it again. In this case, the
cgi.FieldStorage() function returns an object that gives us access to the values that the user submitted. The
getfirst method will return whatever the user input into the form field with the corresponding name (e.g.,
form.getfirst("linecount") will return the value that the user typed in the field whose
name attribute is
An even simpler example using forms
The simplest possible program for processing user input. You could consider this the CGI equivalent of
cat: all it does is take whatever the user input from a form and return it to the browser.
#!/usr/bin/python import cgi print "Content-type: text/html" print "" form = cgi.FieldStorage() if len(form) > 0: stuff = form.getfirst('stuff') safe_stuff = cgi.escape(stuff) print "<p>" + safe_stuff + "</p>" else: print '<form method="POST"><input type="text" name="stuff"><input type="submit"></form>'
Note here the use of
cgi.escape(), a function in the
cgi module that transforms special characters in HTML (like angle brackets) to their HTML entity equivalents, thereby making it more difficult to enact cross-site scripting attacks. (Read more about cross-site scripting attacks.)