Wednesday, April 29, 2009

Functions, Pattern Matching, and Types

Haskell is strongly typed, which means you can't compile ill-typed expressions (using a string when a number is expected and so on). Strong typing implies that types are resolved at compile time rather than runtime... that is, it is a statically typed language. Although the language is statically typed, you rarely need to "finger type" your code because Haskell will infer the types from your definitions. With type inference you can write nice compact programs and still have the performance and safety of static typing.

Before we look at some type inference we'll need to learn a few things about writing Haskell code. Since this language is functional you are primarily writing functions. You define a Haskell function very much like you do in a Mathematics class, as a series of equations. Consider this set of equations

fib(0) = 0
fib(1) = 1
fib(n) = fib(n-2) + fib(n-1)

The top equation says that the function named 'fib' reduces to 1 when the input is 0, the next says the reduction is 1 when the input is 1, and the last equation says that when the input is some number 'n', then the reduction is the sum of the previous two... this should look familiar, it's the generating function for the famous Fibonacci sequence. Let's have a look at this function written in Haskell:

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-2) + fib (n-1)

It's a pretty straightforward transition to Haskell code from the Mathematical definition... this is generally true in most programming languages, but particularly evident with Haskell. The first difference you see is the function signature, this tells you about the function. Just read :: as "has type" and know that 'Int -> Int' means you've got function of one integer argument that returns an integer and this first line reads as:

The function named 'fib' has type of a unary function from Int to Int.

The next three lines are the Fibonacci equations written in code. When the program executes it does pattern matching from the top to the bottom, so if Haskell sees a function call like the following (notice that in Haskell you don't use ( and ) when calling a function):

fib 3

It says "hey is 3 the same as 0?" and the answer is no, so it goes down a line and says "is 3 the same as 1?" again this is a no, so it goes to the next line and says "is 3 the same as n?" and this is actually a match this time (n is the argument name here, not a character literal). So the third equation is picked and then the n is bound to the value 3*. Next the right-side of the equation executes and you get the following after substitution:

fib (3-2) + fib (3-1)

which is really the same as

fib 1 + fib 2

and then you ask the same questions: "is 1 the same as 0?", no... "is 1 the same as 1?", yes so we use that equation this time and it reduces to 1. Rinse, wash, repeat and in the end you've just computed the 4th fibonacci number (2). Now let's do something crazy:

fib 0 = 0
fib 1 = 1
fib n = fib (n-2) + fib (n-1)

I've removed the function signature and left the equations, and that's cool because Haskell just says "hey man, I see what you're doing here, I'll just write the signature for you". This isn't done by some magical little programmer in your computer, it is type inference. If you think about it, it kinda makes sense... there is enough information in the equation set for you and I to figure out the types so why can't a computer do it too? That is what Haskell is doing here and it can do it pretty well all the time.

So why does Haskell even have a way for you to write the type signature then? Well, even though it can be inferred, you might still want to write it out. It's a good way for beginners to get familiar with the type system and also serves as a nice "self documenting" part of the language (you can learn a lot about what the function does by looking at the signature).

So this is just the tip of the Haskell type system iceberg but it's a good starting point. We'll take off from here next time and look into the available types with more detail.

* This is actually a lie... Haskell doesn't bind that n to the value 3 it binds to something called a "thunk". The thunk is a procedure that will produce a 3 when it is called. This means that if the n was never used in the equation, then the thunk is never called, and the input is never evaluated. This isn't a big deal when we've got a literal value being used like we do here but it can be in other scenarios (think infinite). This is a feature called "non-strict evaluation" and it's a big deal in Haskell. We'll be talking about this much more in subsequent posts. I lied at first so that the details don't get in the way truth and you get to understand how Haskell functions work... at least I came clean and gave you a better idea of what the real deal is.