Functions and modules.

Functions

Functions are a building block of most any programming language. So far, we’ve been using a lot of Python’s built-in functions (e.g., len, re.search, etc.). Now we’re going to learn how to define our own.

We use functions to (a) make a piece of code reusable and (b) simplify the layout of code: sometimes it’s easier to read programs if we can replace big blocks of code with a single function call.

If you find yourself using a piece of code over and over again, it’s probably a good candidate for being converted into a function. That way, you can use it all over your program without retyping (or cutting and pasting); if you ever have to change the way the function works, you only have to change it in one place.

Anatomy of a function

Functions in Python have a name, a return value, and arguments. Both the name and the return value are optional. The def keyword starts a function definition, and everything indented underneath forms the body of the function (i.e., the statements that will be executed when the function is called). Here’s an example of a function with no arguments or return value:

def print_random_word():
   print random.choice(['fuzz', 'quinoa', 'jalapeno', 'giraffe', 'scheme'])

The name can be anything that would be a valid variable name (in fact, a function definition is just a funny way of defining a variable; see below). Function names should (preferably) be descriptive of what the function actually does.

Functions can take one or more arguments, like so:

>>> def schoolyard(t1, t2):
  print t1 + " and " + t2 + " sitting in a tree, K-I-S-S-I-N-G!"
>>> schoolyard("Sarah Palin", "David Letterman")
Sarah Palin and David Letterman sitting in a tree, K-I-S-S-I-N-G!

Arguments are variables that are given to the function when it’s called, which the function can then use to do interesting things (like concatenate them, as above, or otherwise mash up, or use in some kind of mathematical formula, or whatever).

The function we wrote above isn’t ideal, since the only thing it knows how to do is print to output. If we wanted to use the result of that function to do other calculations, we’d be out of luck. The solution to this problem is to use a return statement. This allows us to perform some kind of calculation in our function, then return that value back to the code that called the function. That code can then do whatever it wants with that value, such as assigning it to a variable, calling methods on it, etc.

>>> def schoolyard(t1, t2):
...     output = t1 + " and " + t2 + " sitting in a tree, K-I-S-S-I-N-G!"
...     return output
... 
>>> taunt = schoolyard("Sarah Palin", "David Letterman")
'Sarah Palin and David Letterman sitting in a tree, K-I-S-S-I-N-G!'
>>> taunt.upper()
'SARAH PALIN AND DAVID LETTERMAN SITTING IN A TREE, K-I-S-S-I-N-G!'

You can give arguments a default value by including the value in the function definition, like so:

>>> def schoolyard(t1, t2="Sarah Palin"):
...     output = t1 + " and " + t2 + " sitting in a tree, K-I-S-S-I-N-G!"
...     return output
... 
>>> schoolyard("Bob", "Sally")
'Bob and Sally sitting in a tree, K-I-S-S-I-N-G!'
>>> schoolyard("Bill Murray")
'Bill Murray and Sarah Palin sitting in a tree, K-I-S-S-I-N-G!'

In the first call to the function, we provide both parameters, and the default value for the second parameter is ignored. In the second call, we only provide one parameter, so Python uses the default we specified (“Sarah Palin”) instead.

Exercises
  1. Define a function that takes a number and returns its square (n*n).
  2. Define a function to extract all words from a string, and return a list of words in that string with a specified length. (Both the string and the length should be parameters.)

In context: rand_replace.py

Here’s an example program that uses a function. The function is called get_words_from_file, and it takes a single parameter: a filename. The function then opens the file, reads in all of the lines, splits them up into words, and appends them to a list; at the end of the function, that list of words is returned.

import sys
import random

def get_words_from_file(filename):
  all_words = list()
  for line in open(filename):
    line = line.strip()
    words = line.split(" ")
    for word in words:
      all_words.append(word)
  return all_words

words_from_file = get_words_from_file("sowpods.txt")

for line in sys.stdin:
  line = line.strip()
  words = line.split(" ")
  output = ""
  for word in words:
    if random.randrange(8) == 0:
      output += random.choice(words_from_file)
    else:
      output += word
    output += " "
  print output

The function gets called on line 13, and its return value—a list of words from the file that we passed in as a parameter—is assigned to a variable called words_from_file. We then use those words in a simple random transformation of standard input. (Bonus exercise: Explain what this transformation does.)

Scope

With the introduction of functions, an important issue arises: where are variables visible? (The “visibility” of variables is known in computer programming as “scope.”) If I create a variable inside my function, will that same variable be visible to the rest of the program? Conversely, can variables defined outside the function be visible within it?

Here’s a transcript from an interactive interpreter session to demonstrate how Python deals with these questions:

>>> def foo():
...     harold = "hey there"
...     print harold
... 
>>> foo()
hey there
>>> harold
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'harold' is not defined
>>> harold = "new value!"
>>> foo()
hey there
>>> harold
'new value!'

Here we’ve created a function foo that assigns to a variable harold within it. A call to foo displays the expected value. But if we try to refer to harold outside the function, Python complains, saying that it doesn’t know of any variables by that name. Likewise, if we assign to a variable harold outside the function, then call the function again, we can see that the function retains its own idea of what harold is, regardless of what assignments we’ve made outside of the function.

To sum up, Python’s rules of scope:

  1. Variables that you define inside a function are only visible inside that function.
  2. Variables defined outside a function can be accessed from inside the function, but not assigned to (without extra work).
  3. If you want to make a variable visible inside a function, design your function so the relevant information can be passed in as a variable. If you want to make a variable visible outside your function, make that variable the return value of your function.

Rules #1 and #2 exist so that you can use your functions anywhere, without fear that you’ll accidentally overwrite important values in the program you bring them into. (My function’s variable called words shouldn’t overwrite a variable called words elsewhere.) Rule #3 isn’t so much a rule as a rule of thumb: if you’re writing your functions right, you shouldn’t have to refer to values outside your function, except for values that are explicitly provided as parameters, or explicitly returned as return values.

Modules

Once you have a collection of functions, it’s helpful to put those functions in a single file, so that you can easily re-use them in many different programs. In Python, such a file is called a module.

You’ve already been using modules, actually: whenever we’ve used import, we’ve been using one of Python’s built-in modules.

Defining your own module is easy: all you need to do is put a bunch of stuff in a file, and save it with the extension .py. Here’s conjunctions.py, for example:

import random

conjunctions = ['and', 'or', 'so', 'but', 'yet', 'until', 'before', 'after']

# join together the list in 'parts' with the given conjunction, human-style
# (e.g., "one, two and three" instead of "one, two, three"
def human_join(parts, conjunction="and"):
  output = ""
  for i in range(len(parts)):
    output += parts[i]
    if i == len(parts) - 2:
      output += " " + conjunction + " "
    elif i < len(parts) - 1:
      output += ", "
  return output

# join the two parts together using one of the conjunctions defined in this
# module
def random_conjoin(part1, part2):
  return part1 + " " + random.choice(conjunctions) + " " + part2

In any other Python file (or from the interactive interpreter) in the same directory, you can now use import conjunctions to bring those functions into your code. Here’s an example:

>>> import conjunctions
>>> conjunctions.human_join(['one', 'two', 'three'], 'or')
'one, two or three'
>>> conjunctions.random_conjoin("I ate a hamburger", "Larry laughed")
'I ate a hamburger before Larry laughed'

Note that we can use dir to find out what’s in our custom module, just like we can with built-in modules:

>>> dir(conjunctions)
['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'conjunctions', 'human_join', 'random', 'random_conjoin']

Helpful resources

Reading for next week

Reply