Understanding the Stack
Wednesday, 21 May 2014, by Edsko de Vries.
Filed under coding.
One of our clients sent us a bug report, which consisted of a 700 line Haskell program with a complaint that “it deadlocks”. After studying the code I concluded that, for the sake of the bug report, it could be summarized as
print a bunch of stuff, then crash with a stack overflow
So I wanted to replace the original code with code that did just that: print a bunch of stuff, then crash with a stack overflow. No big deal:
{-# LANGUAGE CPP #-}
module Main (main) where
go :: Int -> IO Int
go n = do
if (n `rem` THRESHOLD == 0)
then putChar '.'
else return ()
n' <- go (n + 1)
return (n + n')
main :: IO ()
main = print =<< go 0
The function go
prints a dot every THRESHOLD
recursive calls; we have a (dummy, since it can never be reached) addition after the recursive call to ensure that go
is not tail recursive and will eventually run out of stack space. The THRESHOLD
macro variable is there so that we can tweak how quickly the program runs out of stack space, or in other words, how many dots it prints before it crashes. For example, if we compile the code with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=23
it prints 22,126 dots before crashing; if we compile with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=26
it prints 19,483 dots before crashing. But as I was experimenting to find the right value for this parameter, I noticed something very strange. If we compile with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24
it never crashes! It just prints, and prints, and prints (with GHC 7.6.3). This blog post documents my attempt to understand this behaviour, and explains some aspects of ghc
’s runtime, and in particular, its stack, as it goes.
Checking core
After type checking ghc
translates Haskell to an intermediate language called core
, which is the “real life” version of the more “academic” language System FC. We can ask ghc
to output the (optimized) core
version of our program with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24 -dsuppress-all -ddump-simpl
This is useful, because it allows to verify that even with a THRESHOLD
value of 24 the optimizer did not somehow manage to make the function tail recursive:
$wa =
\ ww_s1SW w_s1SY ->
case remInt# ww_s1SW 24 of _ {
__DEFAULT ->
case $wa (+# ww_s1SW 1) w_s1SY of _ { (# ipv_aEw, ipv1_aEx #) ->
(# ipv_aEw,
case ipv1_aEx of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
};
0 ->
case $wa5 stdout '.' w_s1SY of _ { (# ipv_aEw, _ #) ->
case $wa (+# ww_s1SW 1) ipv_aEw of _ { (# ipv2_XET, ipv3_XEV #) ->
(# ipv2_XET,
case ipv3_XEV of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
}
}
}
This is a bit difficult to read; don’t worry about the details for now, we will start dealing with low level details soon enough. For now, it’s enough to note that $wa
is the translation of go
, and that the recursive calls to $wa
are not in tail position; the optimizer did not somehow manage to make the function tail recursive. Ok. Then what? Why is this function not crashing with a stack overflow?
Simplifying the problem
If we cannot understand the behaviour of the code by looking at core
then we need to drop all the way down to assembly language. However, if we want to have any hope of being able to step through the assembly language we need to simplify that call to putChar
. Sadly, replacing it with
foreign import ccall "putchar" c_putchar :: Char -> IO ()
made the problem go away: the program now always crashed. So the strange behaviour of our program had something to do with the the implementation of putChar
. The real putChar
is more involved that it might seem; it deals with buffering, character encodings, concurrent access to Handle
s, etc.
Unfortunately, since I had no idea what aspect of the implementation of putChar
was causing the behaviour I was seeing, I could think of no other approach than to inline putChar
and the functions it calls, and start simplifying it bit by bit until the strange behaviour disappeared.
Many, many hours later I ended up with this code:
hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
hPutChar w0 = case stdout of () -> (# w0, () #)
go :: State# RealWorld -> (# State# RealWorld, () #)
go w0 =
case maskAsyncExceptions# hPutChar w0 of
(# w1, _ #) ->
case go w1 of
(# w2, () #) -> (# w2, () #)
stdout :: ()
stdout = ()
main :: IO ()
main = IO go
In order to understand this code, you first have to realize that the IO monad is a state monad with the RealWorld
as the state:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
I have written the code with explicit state passing, instead of using the monad, so that the Haskell code is as close as possible to the generated core
and beyond.
That out of the way, let’s discuss the actual code. First of all, the real putChar
is defined as
putChar = hPutChar stdout
where stdout
has type Handle
. As it turns out, there are only two aspects of hPutChar
that are relevant:
hPutChar
does a case analysis on something (actually, of course, it does case analyses on lots of things); as it turns out, all that matters for our example is that it does a case analysis at all, so we have modelledstdout
simply by()
.This case analysis happens while asynchronous exceptions have been masked. This happens because a
Handle
contains anMVar
, and most of the I/O operation happens while we have taken the value from thisMVar
(seewithHandle'
inGHC.IO.Handle.Internals
).
At this point we can simplify no further. Although we are not actually printing any dots anymore, if we run this code it will run forever until you interrupt it manually; but if you remove either the case analysis from hPutChar
or the call to maskAsyncExceptions#
from go
the program will exit with a stack overflow almost immediately. What gives?
Understanding hPutChar
When ghc
compiles your program, after lexical analysis and syntax analysis it first removes syntactical sugar and does scope analysis; then, after type checking, it translates your code to core
, as we saw. The core
gets optimized and then translated to stg
(for “Spineless Tagless G-machine”); stg
code is very similar to core
, but with a handful of additional restrictions; most importantly, all constructor applications must be fully applied (and if not, an explicit lambda must be constructed). Finally, the stg
code gets translated to C--
, which is a portable assembly language similar in intent to llvm
, and the C--
code then is translated to assembly language.
We wrote our code in such a low level way that the core
and stg
translations are not very interesting; they all look very alike. The C--
code however is a big step down in level. Recall that we simplified hPutChar
to
hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
hPutChar w0 = case stdout of () -> (# w0, () #)
The C--
code for this “hPutChar
” is (-ddump-opt-cmm
):
Main.hPutChar_info:
if ((Sp + -8) < SpLim) goto c1ae;
R1 = PicBaseReg + IO.stdout_closure;
I64[Sp - 8] = PicBaseReg + s19G_info;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c1ah;
jump I64[R1]; // [R1]
c1ae:
R1 = PicBaseReg + Main.hPutChar_closure;
jump (I64[BaseReg - 8]); // [R1]
c1ah: jump s19G_info; // [R1]
s19G_info:
R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
Sp = Sp + 8;
jump (I64[Sp + 0]); // [R1]
This might look rather frightening, especially when you are used to Haskell, so let’s take it step by step. We will be pushing a value to the stack, so we first check if there is enough room on the stack to do so:
if ((Sp + -8) < SpLim) goto c1ae;
If not, we will call a function from the GHC runtime called __stg_gc_fun
, which will extend the stack, if possible. We will come back to this in detail later.
In order to do case analysis on stdout
we first have to make sure that it is in weak head normal form (in this case, it will always be, but it might not be in general). In order to do this, we need to call its definition; but before we can do that, we need to do two more things.
First, in general stdout
might be a thunk, with free variables in the closure, and hence it needs to know where it can find the values of those free variables. The convention is therefore that when we call a closure, the address of the closure can always be found in register R1
:
R1 = PicBaseReg + IO.stdout_closure;
Secondly, when stdout
completes, it needs to know what to do next. It does this by looking at the stack: when it finishes, it loads a “continuation address” from the top of the stack and calls it. Hence we need to push this address to the stack so that stdout
will be able to find it later:
I64[Sp - 8] = PicBaseReg + s19G_info;
Sp = Sp - 8;
At this point we could call stdout
, but actually we can do slightly better. ghc
implements an optimization called pointer tagging. Since addresses are always word aligned, the lower 3 bits (or, on 32-bit machines, the lower 2 bits) of addresses are always 0. For datatype constructors ghc
uses these bits to encode, as part of the pointer itself, which constructor of the datatype it is (non-zero value) or whether the thunk is not yet in weak head normal form. So we can check by looking at the pointer if there is a point calling stdout
at all:
if (R1 & 7 != 0) goto c1ah;
c1ah: jump s19G_info; // [R1]
If the constructor is already in weak head normal form, we call the continuation directly (note that s19G_info
is the same address that we pushed to the stack as the continuation address for stdout
). Finally, if it turns out that we do still need to evaluate the stdout
we call it:
jump I64[R1]; // [R1]
In the continuation we now know that stdout
is in weak head normal form; technically speaking, we should now pattern match on it to find out which constructor it is, but of course ()
only has a single constructor, so we can skip that step. This means that all that is left to do is to call our continuation with the result. We are returning
(# w0, () #)
In general, unboxed tuples are represented as a pair of pointers, either as two consecutive memory locations or, ideally, in two registers. However, real world tokens disappear from generated code, so all we have to return is ()
. By convention the first few arguments are passed in registers; in this case, that means that we need to load the address of ()
into register R1
before calling the continuation:
R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
Sp = Sp + 8;
jump (I64[Sp + 0]); // [R1]
The + 1
part is pointer tagging at work: ()
is already in weak head normal form, and it is the first (and only) constructor of the ()
type.
Understanding go
At this point you might realize why I said at the start that it would be completely unfeasible to step through the real hPutChar
; our simplified version just does a case analysis (on something of unit value, no less) and then returns unit, and it is already complicated enough! What about go
?
go :: State# RealWorld -> (# State# RealWorld, () #)
go w0 =
case maskAsyncExceptions# hPutChar w0 of
(# w1, _ #) ->
case go w1 of
(# w2, () #) -> (# w2, () #)
Thankfully, the code was carefully written to make the translation to lower level code as simple as possible; the C--
translation of go
does not introduce any more concepts that we already used in hPutChar
:
Main.go_info():
c1aM:
if ((Sp + -8) < SpLim) goto c1aO;
R1 = PicBaseReg + Main.hPutChar_closure + 1;
I64[Sp - 8] = PicBaseReg + s19O_info;
Sp = Sp - 8;
jump stg_maskAsyncExceptions#; // [R1]
c1aO:
R1 = PicBaseReg + Main.go_closure;
jump (I64[BaseReg - 8]); // [R1]
s19O_info:
I64[Sp + 0] = PicBaseReg + s19N_info;
jump Main.go_info; // []
s19N_info:
I64[Sp + 0] = PicBaseReg + s19M_info;
if (R1 & 7 != 0) goto c1aI;
jump I64[R1]; // [R1]
c1aI:
jump s19M_info; // [R1]
s19M_info:
R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
Sp = Sp + 8;
jump (I64[Sp + 0]); // [R1]
The structure is very similar as before, except that we have two (actually, three) case statements.
First we do a stack overflow check again, as before.
We need to evaluate the scrutinee of the first case statement: the call to the primop
maskAsyncExceptions#
(the equivalent ofmask_
in Haskell-land), which expects its argument inR1
. In this case, the argument ishPutChar
, except that as before we use pointer tagging (in this case, to indicate that the function has been evaluated and that its arity is 1).The continuation after we are finished with the scrutinee is
s19O_info
; this is a second case statement, so we push a second continuation (s19N_info
) onto the stack and recursively callgo
.The second continuation—which is never reached—makes sure that
go
indeed returned unit, and then returns unit; if you look at thestg
translation ofgo
(-ddump-stg
) you will notice thatgo
is actually a triply nested case statement:
case maskAsyncExceptions# [Main.hPutChar w0_s19t] of _ {
(#,#) ipv_s19x _ ->
case Main.go ipv_s19x of _ {
(#,#) ipv2_s19D ipv3_s19B ->
case ipv3_s19B of _ { () -> (#,#) [ipv2_s19D GHC.Tuple.()]; };
};
};
Three case statements, hence three continuations (s19O_info
, s19N_info
, s19M_info
).
Running the code
All this is pure theory, so far. Let’s confirm it by actually running our code. We can load up the code into a debugger; make sure to compile the code with -debug
to link it against the version of the Haskell RTS that has debugging symbols. Since I am working on OSX Mavericks I will be using lldb
; gdb
will work in a very similar way.
One of the abstractions that C--
offers over real assembly language is virtual registers. When we translate C--
to assembly these are mapped to machine registers by a register allocator in ghc
’s native code generator. On my machine, Sp
and SpLim
are mapped to rbp
and r15
, and R1
is mapped to rbx
.
Let’s load up our code, set a breakpoint in go
, ask it to print Sp
and SpLim
whenever we hit a breakpoint, and then start the program (which, during my debugging, I called “weird
”, which is why you will see many references to “weird
” below).
# lldb weird
Current executable set to 'weird' (x86_64).
(lldb) breakpoint set -n Main_go_info
Breakpoint 1: where = weird`Main_go_info, address = 0x0000000100000f98
(lldb) target stop-hook add --one-liner "register read rbp r15"
Stop hook #1 added.
(lldb) run
Process 35298 launched: '/Users/dev/wt/weird/weird' (x86_64)
rbp = 0x0000000000000000
r15 = 0x0000000000000000
Process 35298 stopped
* thread #1: tid = 0xebc474, 0x0000000100000f98 weird`Main_go_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000f98 weird`Main_go_info
weird`Main_go_info:
-> 0x100000f98: leaq -0x8(%rbp), %rax
0x100000f9c: cmpq %r15, %rax
0x100000f9f: jb 0x100000fc0 ; Main_go_info + 40
0x100000fa1: leaq 0xd8818(%rip), %rax ; Main_hPutChar_closure
rbp = 0x0000000100405370
r15 = 0x00000001004050c0
lldb
started the program, stopped at the breakpoint, and told us the value of Sp
and SpLim
, as asked. It also shows us a disassembly of the code. The C--
code started with
if ((Sp + -8) < SpLim) goto c1aO;
which, if you recall, was to check if we might run into a stack overflow. This translates into
leaq -0x8(%rbp), %rax
cmpq %r15, %rax
jb 0x100000fc0 ; Main_go_info + 40
in Intel assembly language. The details don’t matter very much, and are beyond the scope of this blog post. If you haven’t seen Intel assembly language, leaq
is “load effective address”, cmpq
is “compare” and jb
“jump if below”. We will not explain such details any further; hopefully you will be able to squint a bit and see that it resembles the C--
rather closely.
Before we do anything else, let’s see what’s on the stack before we start:
(lldb) memory read -format A -count 3 $rbp
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
So, the stack frame immediately above us when we start is a catch frame. Catch frames are pushed onto the stack when you use catch
, and are used by the runtime to find which exception handler to run when an exception occurs. The catch frame has two additional fields in its payload: the handler to run (in this case, the default top-level handler), and a mask which indicates if asynchronous exceptions have been blocked or not (in this case, they haven’t).
We can use the step
command from lldb
to start stepping through the execution of the code. We check for stack overflow, find that there is none, and then set things up for the first case statement in go
: we push a continuation address to the stack, put the argument to maskAsyncExceptions#
in rbx
(the machine equivalent of R1), and then call maskAsyncExceptions
. At this point the stack therefore looks like
0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
and we can confirm that rbx
contains the tagged address of hPutChar
(lldb) image lookup --address $rbx
Address: weird[0x00000001000d97c1] (weird.__DATA.__data + 1)
Summary: Main_hPutChar_closure + 1
You can find the implementation of maskAsyncExceptions#
as stg_maskAsyncExceptionszh
in Exception.cmm
in the rts/
directory of the ghc
source (“zh
” is the z-encoding of “#
”). The details are not so important, however. It masks async exceptions by setting a flag in a register, pushes a frame onto the stack to unmask exceptions when we are done, and then calls the function (whose address is in R1
). This means that we will end up in hPutChar
, at which point the stack looks like
0x100405360: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
This continues on. hPutChar
executes, completes, and runs the contination, which happens to be stg_unmaskAsyncExceptionszh_ret_info
; this unmasks asynchronous exceptions, and continues with the continuation above it, which in this case is the original continuation from go
. And after some more steps we end up back in go for the next recursive call, and the whole process repeats. However, the stack now looks like
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info
Remember that go
is not tail recursive, so once this recursive call completes, we still need to call the continuation from the previous invocation. And this repeats; if we run until the breakpoint again (using cont
) and print the stack, we will see
0x100405360: 0x0000000100000f38 weird`s19N_info
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info
and so on. This is precisely the behaviour that we were expecting: since the function is not tail recursive, we are using up more and more stack space and should eventually run out, and crash with a stack overflow. So why don’t we?
Stack overflow
After precisely 84 recursive calls, the stack looks like
0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
with 84 s19N_info
pointers in total. Moreover, the top of the stack (virtual register Sp
, real register rbp
) is now dangerously close to the stack limit (virtual register SpLim
, real register r15
):
(lldb) register read rbp r15
rbp = 0x00000001004050d0
r15 = 0x00000001004050c0
That is, we have 16 bytes left, or space for two addresses. This means that we have just enough stack space to make it to the entry point for hPutChar
; we get there in the same way as before, and the stack now looks like
0x1004050c0: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004050c8: 0x0000000100000f68 weird`s19O_info
0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
Note that we pushed two additional addresses onto the stack, which is therefore now full. Remember that hPutChar
starts with a check for stack overflow:
if ((Sp + -8) < SpLim) goto c1ae;
c1ae:
R1 = PicBaseReg + Main.hPutChar_closure;
jump (I64[BaseReg - 8]); // [R1]
Actually, if we look at the unoptimized C--
minus instead (-ddump-cmm
) we will see
c1ae:
R1 = Main.hPutChar_closure;
jump stg_gc_fun; // [R1]
instead, which is a lot clearer: we load the address of hPutChar
into register R1
and then run the garbage collector. In the optimized C--
code we find the address of the garbage collector somewhere else, but we can verify in lldb
that it’s the same thing:
(lldb) memory read -format A -count 1 $r13-8
0x1000e1290: 0x000000010009fcb0 weird`__stg_gc_fun
Either way, the garbage collector runs. It notices that we are out of stack space (as opposed to out of heap space; we carefully avoided any heap allocation in this test code), creates a new, bigger, stack, copies the old stack over, and then calls back into our function. In other words, we end up back at the start of hPutChar
, but now the stack looks like:
0x1004fcd30: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004fcd38: 0x0000000100000f68 weird`s19O_info
0x1004fcd40: 0x0000000100000f38 weird`s19N_info
...
0x1004fcfd8: 0x0000000100000f38 weird`s19N_info
0x1004fcfe0: 0x000000010009eb68 weird`stg_catch_frame_info
0x1004fcfe8: 0x0000000000000000
0x1004fcff0: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
which is the same stack as before, except at a different location; and, crucially, we have space on the stack again:
(lldb) register read rbp r15
rbp = 0x00000001004fcd30
r15 = 0x00000001004f50c0
so we can continue recursing.
Running out of stack space
Of course, we cannot continue increasing the stack forever; eventually we should reach the maximum stack size (8 MB by default in ghc
7.6). So why don’t we? Well, now that we understand the problem at this level of detail a quick Google search will reveal that this is due to a “bug” in ghc
(depending on your definition of bug, I suppose) prior to version 7.8: when a stack overflow happens while asynchronous exceptions are blocked, the stack will be grown no matter what the limitation on the stack size is.
In this example, it just so happens that we run out of stack space when we try to push the continuation address in hPutChar
, at which point asynchronous exceptions are indeed masked. Hence, this program will continue growing the stack unpunished, until we run out of machine memory completely. If we remove the case statement from hPutChar
(or indeed, if we modify the program in a myriad of other, minor, ways) we would detect the stack overflow outside of the maskAsyncExceptions#
and hence we would crash with a stack overflow exception when we reach the maximum stack size.
In the original program that we started with, the value of THRESHOLD
determines how many continuation addresses for go
are on the stack when we call (the real) putChar
. We run out of stack space either while asynchronous exceptions were masked (somewhere deep inside the bowels of the real hPutChar
), or outside it, and hence we would run forever or crash almost immediately, respectively. This is clearly not what one would expect, and with GHC 7.8 the problem has been resolved – a stack overflow is now treated like any other asynchronous exception, and the program will crash with a stack overflow as soon as asynchronous exceptions are no longer masked.
Further Reading
If you want to understand ghc’s runtime execution model How to make a fast curry: push/enter vs eval/apply is essential reading, although you can ignore the comparison with push/enter, which is not used in ghc. Faster Laziness Using Dynamic Pointer Tagging explains the pointer tagging trick we discussed above. Apart from these two papers, there are various other references that might be helpful. Be aware however that ghc
is constantly evolving and many of these references may no longer precisely match what is in ghc
right now.
System F with Type Equality Coercions describes System FC, the idealized version of
core
; Chapter 25 (Profiling and Optimization) of Real World Haskell discusses readingcore
and using it for optimization purposes.I know kung fu: learning STG by example on the
ghc
wiki describing the compilation from Haskell down to assembly language in quite some detail.Edward Z. Yang has a number of blog posts that discuss the execution of Haskell code at a low level of detail; they are all worth reading: Unraveling the mystery of the IO monad, Tracing the compilation of Hello Factorial!, and Interrupting GHC
Blog post Thunks and lazy blackholes: an introduction to GHC at runtime by Keegan McAllister discusses more about the heap, which we did not cover at all here.
For
C--
it is unfortunately hard to find definitive references;ghc
’s version ofC--
(cmm
) differs significantly from the official standard. The description on the ghc wiki is probably a good starting point; the comment at the top ofCmmParse.y
in theghc
sources is also very helpful. The paperC--
: a portable assembly language that supports garbage collection (Abstract) describes some of the underlying motivations but is somewhat outdated. Hoopl: A Modular, Reusable Library for Dataflow Analysis and Transformation describes a library that is used to optimizeC--
.Runtime Support for Multicore Haskell describes how Haskell runs on multiple cores; Section 4 contains some information about the GHC runtime, and in particular, how it relates to foreign calls (calls into the C world). Extending the Haskell Foreign Function Interface with Concurrency is an older paper that describes this in some detail, too.
Debugging GHC-compiled code with gdb on the
ghc
wiki contains some useful information about usinggdb
on compiled Haskell code; GDB and LLDB Command Examples is a nice a reference forlldb
.Finally, David Terei has a presentation about
ghc
where he gives a bird’s eye view of the entire thing, and his bachelor’s thesis Low Level Virtual Machine for Glasgow Haskell Compiler talks about thellvm
backend forghc
.
Postscript
One of the many variations that I played with while tracking this bug down looked like
module Main (main) where
import Control.Exception
import Data.IORef
import System.IO.Unsafe (unsafePerformIO)
counter :: IORef Int
{-# NOINLINE counter #-}
counter = unsafePerformIO $ newIORef 0
hPutChar :: IO ()
hPutChar = mask_ $ do
n <- readIORef counter
writeIORef counter (n + 1)
go :: Int -> IO Int
go n =
case n `rem` 2 of
0 -> do hPutChar
go (n + 1)
_ -> do n' <- go (n + 1)
return (n + n')
main :: IO ()
main = do _ <- go 0 ; return ()
When you run this in 7.8 this program will crash with
Weird: internal error: scavenge_stack: weird activation record found on stack: 415597384
(GHC version 7.8.2 for x86_64_apple_darwin)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Abort trap: 6
This turns out to be due to an unrelated bug in 7.8.2; see #9045 (and #8866). It has already been fixed and should be released as part of 7.8.3.