Simple SMT solver for use in an optimizing compiler
Wednesday, 31 December 2014, by Edsko de Vries.
Filed under coding.
This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like
if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3
into
if a == 0 then
write 2
else
write 3
without much effort at all.
Preliminaries
For the sake of this blog post we will consider a very simple imperative object language, defined as
data Expr =
-- Arithmetic expressions
ExprInt Integer -- ^ Integer constants
| ExprAdd Expr Expr -- ^ Addition
-- Boolean expressions
| ExprBool Bool -- ^ Boolean constants
| ExprEq Expr Expr -- ^ Equality
| ExprNot Expr -- ^ Negation
| ExprAnd Expr Expr -- ^ Logical conjunction
| ExprOr Expr Expr -- ^ Logical disjunction
-- Variables
| ExprVar VarName -- ^ Read from a variable
| ExprAssign VarName Expr -- ^ Write to a variable
-- Control flow
| ExprSeq Expr Expr -- ^ Sequential composition
| ExprIf Expr Expr Expr -- ^ Conditional
-- Input/output
| ExprRead -- ^ Read an integer from the console
| ExprWrite Expr -- ^ Write an integer to the console
We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as
[expr| if a == 0 then read else write b |]
instead of
ExprIf (ExprEq (ExprVar a) (ExprInt 0))
ExprRead
(ExprWrite (ExprVar b))
How you can write such a quasi-quoter was the topic of the previous blog post. You should however be able to read this blog post without having read the previous post; hopefully the mapping from the concrete syntax to the constructors of Expr
is pretty obvious.
Simplifying assumption
Our goal will be to write a function
provable :: Expr -> Bool
Let’s consider some examples:
The expression
a || True
should be provable: no matter what value variablea
has,a || True
is alwaysTrue
.Likewise, the expression
a || !a
should be provable, for the same reason.However, expression
a
should not be provable (a
might beFalse
)Likewise, expression
!a
should also not be provable (a
might beTrue
)
Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable.
What about an expression !(n == 0 && n == 1)
? Morally speaking, this expression should be provable. However, we will be making the following very important simplifying assumption:
provable
is allowed to give up: whenprovable
returnsFalse
, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”.
From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code.
An evaluator
We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner.
But we’re getting ahead of ourselves. Let’s consider how to write the interpreter first. The interpreter will be running in an Eval
monad; for our first attempt, let’s define this monad as a a simple wrapper around the list monad:
newtype Eval a = Eval { unEval :: [a] }
deriving ( Functor
, Applicative
, Alternative
, Monad
, MonadPlus
)
runEval :: Eval a -> [a]
runEval act = unEval act
We will use the monad for failure:
throwError :: Eval a
throwError = Eval []
We can provide error recovery through
onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $
case act of
[] -> handler
rs -> rs
We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:
evalInt :: Expr -> Eval Integer
evalInt = go
where
go (ExprInt i) = return i
go (ExprAdd a b) = (+) <$> evalInt a <*> evalInt b
go (ExprIf c a b) = do cond <- evalBool c
evalInt (if cond then a else b)
go _ = throwError
Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting:
evalBool :: Expr -> Eval Bool
evalBool = \e -> go e `onError` guess e
where
go (ExprBool a) = return a
go (ExprEq a b) = (==) <$> evalInt a <*> evalInt b
go (ExprNot a) = not <$> evalBool a
go (ExprAnd a b) = (&&) <$> evalBool a <*> evalBool b
go (ExprOr a b) = (||) <$> evalBool a <*> evalBool b
go (ExprIf c a b) = do cond <- evalBool c
evalBool (if cond then a else b)
go _ = throwError
guess _e = return True <|> return False
The definition of go
contains no surprises, and follows the definition of go
in evalInt
very closely. However, the top-level definition
evalBool = \e -> eval e `onError` guess e
is more interesting. If for some reason we fail to evaluate a boolean expression (for example, because it contains a variable) then guess
returns both True
and False
. Let’s consider some examples:
runEval $ evalBool [expr| True |]
evalutes to [True]
;
runEval $ evalBool [expr| a |]
evaluates to [True, False]
because we don’t know what value a
has, but
runEval $ evalBool [expr| a || True |]
evaluates to [True, True]
: we still have two guesses for a
, but no matter what we guess a || True
always evaluates to True
.
Satisfiability
We can now write our SMT solver; as promised, it will be a single line:
satisfiable :: Expr -> Bool
satisfiable = or . runEval . evalBool
Function satisfiable
(the “S” in SMT) checks if the expression is true for some values of the variables in the expression. How do we check this? Well, we just run our evaluator which, when it encounters a variable, will return both values for the variable. Hence, if any of the values returned by the evaluator is True
, then the expression is true at least for one value for each variable.
Once we have an implementation of satisfiability, we can implement provable
very easily: an expression is provable if its negation is not satisfiable:
provable :: Expr -> Bool
provable = not . satisfiable . ExprNot
If we consider the three examples from the previous section, we will find that both True
and a || True
are provable, but a
by itself is not, as expected.
Inconsistent guesses
The careful reader might at this point find his brow furrowed, because something is amiss:
runEval $ evalBool [expr| a || !a |]
evaluates to
[True, True, False, True]
This happens because the evaluator will make two separate independent guesses about the value of a
. As a consequence, a || !a
will be considered not provable.
Is this a bug? No, it’s not. Recall that we made the simplifying assumption that provable
is allowed to give up: it’s allowed to say that an expression is not provable even when it morally is. The corresponding (dual) simplifying assumption for satisfability, and hence the interpreter, is:
The interpreter, and hence satisfiability, is allowed to make inconsistent guesses.
Making an inconsistent guess is equivalent to assuming False
: anything follows from False
and hence any expression will be considered satisfiable once we make an inconsistent guess. As a consequence, this means that once we make inconsistent guesses, we will consider the expression as not provable.
More precision
We can however do better: whenever we make a guess that a particular expression evaluates to True
or False
, then if we see that same expression again we can safely make the same guess, rather than making an independent guess. To implement this, we extend our Eval
monad with some state to remember the guesses we made so far:
newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a }
deriving ( Functor
, Applicative
, Alternative
, Monad
, MonadPlus
, MonadState [(Expr, Bool)]
)
runEval :: Eval a -> [a]
runEval act = evalStateT (unEval act) []
throwError :: Eval a
throwError = Eval $ StateT $ \_st -> []
onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $ StateT $ \st ->
case runStateT act st of
[] -> runStateT handler st
rs -> rs
The implementation of evalInt
does not change at all. The implementation of evalBool
also stays almost the same; the only change is the definition of guess
:
guess e = do
st <- get
case lookup e st of
Just b -> return b
Nothing -> (do put ((e, True) : st) ; return True)
<|> (do put ((e, False) : st) ; return False)
This implements the logic we just described: when we guess the value for an expression e
, we first look up if we already made a guess for this expression. If so, we return the previous guess; otherwise, we make a guess and record that guess.
Once we make this change
runEval $ evalBool [expr| a || !a |]
will evaluate to [True,True]
and consequently a || !a
will be considered provable.
Example: folding nested conditionals
As an example of how one might use this infrastructure, we will consider a simple pass that simplifies nested conditionals (if-statements). We can use provable
to check if one expression implies the other:
(~>) :: Expr -> Expr -> Bool
(~>) a b = provable [expr| ! $a || $b |]
The simplifier is now very easy to write:
simplifyNestedIf :: Expr -> Expr
simplifyNestedIf = everywhere (mkT go)
where
go [expr| if $c1 then
if $c2 then
$e1
else
$e2
else
$e3 |]
| c1 ~> c2 = [expr| if $c1 then $e1 else $e3 |]
| c1 ~> [expr| ! $c2 |] = [expr| if $c1 then $e2 else $e3 |]
go e = e
The auxiliary function go
pattern matches against any if-statement that has another if-statement as its “then” branch. If we can prove that the condition of the outer if-statement implies the condition of the inner if-statement, we can collapse the inner if-statement to just its “then” branch. Similarly, if we can prove that the condition of the outer if-statement implies the negation of the condition of the inner if-statement, we can collapse the inner if-statement to just its “else” branch. In all other cases, we return the expression unchanged. Finally, we use the Scrap Your Boilerplate operators everywhere
and mkT
to apply this transformation everywhere in an expression (rather than just applying it top-level). For example,
simplifyNestedIf [expr|
if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3
|]
evaluates to
if a == 0 then
write 2
else
write 3
as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.
Conclusion
We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.
Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example
runEval $ evalBool [expr| !(n == 0 && n == 1) |]
will evaluate to
[False,True,True,True]
because it is making independent guesses about n == 0
and n == 1
; consequently !(n == 0 && n == 1)
will not be considered provable. We can increase the precision of our solver by making guess
smarter (the “MT” or “modulo theories” part of SMT). For example, we could record some information about the guesses about integer valued variables.
At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization.