This website contains a detailed description of how Haskell programs, compiled with GHC, are executed. It is maintained by Tom Primožic, and its content is in public domain (uncopyrighted).
This description is based on my own research and information gathered from various academic papers, GHC source code, and the compiled code of Haskell programs. There might be certain bits of the runtime I don't yet understand completely - if you notice any discrepancies or you believe I made errors, please inform me, preferably by editing this Wiki.
The compilation of Haskell programs
Haskell is a language quite unlike many others. It is extremely high level, and provides powerful, yet complex abstractions of the underlying machine. It features a Garbage Collection, which abstracts the computer memory, and provides higher-order functions, function closures and partial application, which abstract the CPU execution model. Finally, it is a lazy language, which means that computation only happens when it is required. For example, one can easily define infinite lists, which don't consume infinite memory, but only as much memory as is needed by the rest of the program.
nats = 1 : map (+1) nats
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
Here, nats is a list starting with 1, and followed by itself with the function (+1) applied to every element, which means that it is a list of all natural numbers. Similarly, fibs is a list starting with 1 and 1, which are followed by a list, where each element is generated by summing the corresponding elements in fibs and tail fibs. So, the third element of fibs is the sum of the first element of fibs and the first element of tail fibs, which is the second element of fibs. The fourth element is the sum of the second and third elements of fibs. Therefore, fibs is an infinite list of Fibonacci numbers. These list don't yet exist - their elements only get computed when we need to access them.
print (fibs !! 15)
If we want to print the 15th element, the fibs list gets evaluated up to the element number 15. When we will need the rest of it, more will be evaluated.
Laziness is of course a very interesting and powerful feature of a programming language, but is significantly complicates the compilation and generation of efficient code. Other Haskell features, functional closures and partial application, make the complication even more complex. Let's consider two equivalent programs, the first in C, and the second in Haskell.
int example1(bool (*f)(int x), int a)
{
bool b = (*f)(a);
if(b) {
return 1;
}
else {
return 0;
}
}
The compilation of function example1 is rather straightforward. The function receives two parameters, one pointer and one non-pointer. It calls the function pointed to by the first parameter and passes the second parameter as the function argument. It then examines the resulting boolean and returns the appropriate value. The compiled code looks like this (assuming R1 is a machine register and disregarding the actual call and return conventions - we will deal with that later on):
int example(*f, a):
push a
R1 = call *f
if(R1 == 0) jump else
ret 1
else:
ret 0
The equivalent Haskell code is:
example :: (Integer -> Bool) -> Integer -> Integer
example f a =
let b = f a
in
if b then 1 else 0
The compilation of Haskell code is not so straightforward. The are 2 main issues to consider. The first issue is laziness - the function f might not really be a function, and the parameter a might not really be an integer; any or both of them could be a suspended evaluation, waiting to be needed by the rest of the program. We don't really care about a, as we don't technically need it - the function f might need to know its value, but not necessarily. But, we must be sure that f is indeed a function, because we cannot call a suspended computation with one parameter a (lazy thunks are parameterless functions). Also, the return value of function f might not be a real return value after all - it might again be suspended computation. Since the value of our function depends on the value of b, we must also evaluate b to find its real value.
The second issue is the fact that the function f can actually be a function closure - e.g. f = (==) 3. Here, f is actually the function (==), applied to a value 3. Since (==) actually takes two parameters, f is actually a paritial application - it is the function (==) applied to the parameter 3 and waiting for another parameter, and it is represented by a heap object containing two fields - the pointer to the function (==) and the first parameter, 3. When f is passed to example, and is applied to another argument a, the the function (==) must be called with parameter 3 loaded from the heap and parameter a passed along. Since the function example has no idea what kind of function to expect, it must be able to deal with everything. In practice, this is implemented by making all function objects have the same structure - the first field of the heap object points to the function code, and the rest represent the arguments that were already applied to the function (0 or more of them).
The Haskell function example would be compiled roughly like this (the handling of the function closure f is abstracted with a call to fun_apply - we will deal with calling a function later) :
int example(f, a):
R1 = evaluate f
push a
R2 = fun_apply(R1)
R3 = evaluate R2
if(R3 == 0) jump else
ret 1
else:
ret 0
This is still very much simplified. In the rest of this document, we will try to reconstruct exactly how the function example is compiled. We will start with the layout of the Haskell heap objects and how it relates to laziness, then we will deal with the call/return semantics of Haskell programs, using simpler examples, and finally we will consider some more general and more complicated example programs.
How objects are represented in Haskell
RTS commentary does a good job explaining this at [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects])
- Objects can be unboxed or boxed. Unboxed objects (e.g. unboxed ints Int#, unboxed tuples (#…#), etc. ) are represented by themselves and live on the stack, in registers, or in the fields of other heap objects.
- Boxed objects live on the heap. They are represented by pointers to data on the heap.
- Boxed objects can belong to lifted or unlifted types. Unlifted types (e.g. ByteArray#) contain only the final objects, and cannot contain bottom _|_.
- Lifted types can include _|_ and lazy, unevaluated bits of computation (thunks).
- Every heap object is actually a closure - it can be "entered" and "evaluated".
Call and return conventions
(From Adaptive evaluation of non-strict programs, PhD thesis of Robert Ennals, page 97.)
Since all returns are actually calls of the corresponding continuations, there are no return conventions, only call conventions.
- Calling a heap closure (evaluating a lazy thunk) is done by jumping to the thunk's entry code. The register R1 points to the heap closure.
- Calling a function is done by jumping to the body code of the function. If there are any free variables (e.g. the function is a closure), the R1 register is set to point to the heap object representing the function. The rest of the arguments are passed in any remaining free registers, or on the stack.
- Returning to a stack frame is done by jumping to the stack frame's entry code, with R1 pointing to the value being returned (the register Sp points to the stack frame).
Haskell execution model
(explained at http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution, but I feel many topics are missing)
Laziness is causing problems. Haskell code cannot assume that a value of type Bool is either True or False - it may also be a lazy thunk that has not evaluated yet, or an indirection, or an error condition encountered when evaluating the thunk, or a black-hole, which is formed when another Haskell thread starts evaluating a thunk. Therefore, when Haskell code is compiled to Core and then to Cmm, individual functions are compiled to many basic blocks, which represent different parts of the function. E.g. a function
let f x = if x then 1 else 0
would be compiled into something like:
let f x = case x of
Thunk t -> let v = evaluate t in f(v)
| Blackhole b -> let v = waitFor b in f(v)
| Indirection r -> let v = followIndirection i in f(x)
| True -> 1
| False -> 0
Notice that every time than a non-final value is encountered, function f must be called again, as one cannot assume that the resulting value will be a boolean (e.g. we can have Indirection (Indirection True)).
Every time that a value is used, such a case matching has to be performed. In order to reduce the executable size (and to improve efficiency ?!?), GHC uses a different mechanism from the one outlined above. Every heap object is in fact a closure, and can be called/ entered / evaluated. So, the above function is actually compiled this way:
let f x = let v = evaluate x in
case v of
True -> 1
| False -> 0
Each heap object has an associated entry code, which gets executed when the object is evaluated. The entry codes for final objects simply returns the object itself. The entry code for an indirection evaluates the object it points to (so Indirection (Indirection True) is no longer a special case). The entry code for a thunk evaluates the lazy computation and returns the resulting value, and so on.
The GHC calling convention is to put the first N parameters into the registers and the rest on the stack (which stack ?!? Haskell TSO stack or C stack?), and jumping to the entry code. If the function is a heap closure, the pointer to the heap closure itself is passed in the R1 register. The functions return simply by putting the return value into the R1 register and jumping to the entry code of the next stack frame.
Let's clarify this with an example. Consider the function
let example b n f p =
if b then
n + 1
else
f p
This function is compiled as follows (the sp pointer represents the top stack frame, and R1-R4 are registers):
example_main b n f p:
push [entry_code: example_1; n; f; p]
R1 = b
jump R1->entry_code
example_1:
if R1 then
R1 = sp->n
push [entry_code: example_2]
jump R1->entry_code
else
R2 = sp->f
R3 = sp->p
push [entry_code: example_4; R2; R3]
R1 = 2 // We want to allocate a heap value of size 2
jump heap_allocate
example_2:
R1 = R1 + 1 // R1 contains the final value of n
jump (sp - 1)->entry_code // Return
example_4:
R2 = sp->f
R3 = sp->p
R1->header = THUNK_HEADER
R1[1] = R2
R1[2] = R3
jump (sp - 1)->entry_code // Return





