Providence Salumu
By far the most widely used style of testing today is unit testing.
Invent a "state of the world".
Run the unit we're testing (e.g. a function).
Check the modified state of the world to see if it looks like it should.
Profit!
If you're used to unit testing, this may look reasonable.
public class TestAdder {
public void testSum() {
Adder adder = new AdderImpl();
assert(adder.add(1, 1) == 2);
assert(adder.add(1, 2) == 3);
assert(adder.add(2, 2) == 4);
assert(adder.add(0, 0) == 0);
assert(adder.add(-1, -2) == -3);
assert(adder.add(-1, 1) == 0);
assert(adder.add(1234, 988) == 2222);
}
}
The "real world" adds tons of cruft: mock objects, preprocessor abuse to expose "test seams", etc.
The example from the previous slide contains 7 tests.
It's not hard to imagine a world in which we lose the will to continue inventing new unit tests long before we've exhausted our search of the space of possible bugs.
mergeSort :: (a -> a -> Bool) -> [a] -> [a]
mergeSort pred = go
where
go [] = []
go [x] = [x]
go xs = merge (go xs1) (go xs2)
where (xs1,xs2) = split xs
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys)
| pred x y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys
The purpose of splitting is to produce two sublists of roughly equal length, so that they can be merged (where the sorting occurs).
split :: [a] -> ([a],[a])
split [] = ([],[])
split [x] = ([x],[])
split (x:y:zs) = (x:xs,y:ys)
where (xs,ys) = split zs
sort [1,2,3,4] == [1,2,3,4]
sort [4,3,2,1] == [1,2,3,4]
sort [1,4,2,3] == [1,2,3,4]
...
This gets a little dull.
A property is nothing more than a predicate that should always hold.
What's an obvious property for sorts?
t_idempotent = sort . sort == sort
We can't define this, since we can't compare functions for equality.
However, we can cheat:
t_idempotent xs =
sort (sort xs) == sort xs
t_idempotent [3,2,1]
Okay! Let's mechanize some exhaustive testing.
To exhaustively test over all lists containing the above elements, we need some help.
import Data.List (permutations)
t_permute :: ([a] -> Bool) -> [a] -> Bool
t_permute prop = all prop . permutations
Over what list sizes is this practical?
Clearly, exhaustive testing is impractical for most interesting purposes.
The insight of QuickCheck:
Combine property-based testing with randomly generated data.
This obviously can't give us the same assurance as exhaustive testing, but it's practical.
We can choose the amount of data to throw at our properties.
This lets us tune the tradeoff between degree of assurance and cost.
At the ghci
prompt:
>> import Test.QuickCheck
>> quickCheck (t_idempotent compare)
+++ OK, passed 100 tests.
As the output suggests, QuickCheck generated 100 random lists, and tested our property over them.
Yay!
The type of compare
is polymorphic:
>> :type t_idempotent compare
t_idempotent compare :: Ord a => [a] -> Bool
So how are we able to generate a list?
Haskell's usual defaulting rules take each group of constraints (C1 a, C2 a, ..., Cn a)
for each type variable a
, and defaults the type variable if all of the following conditions hold:
The type variable a
appears in no other constraints.
All the classes Ci
are standard.
At least one of the classes Ci
is numeric.
To reduce the number of types we're forced to specify by hand, ghci
relaxes the standard rules (changes in italics):
The type variable a
appears in no other constraints. Unchanged.
All the classes Ci
are standard, and all are single-parameter type classes.
At least one of the classes Ci
is numeric, or is Show
, Eq
, or Ord
.
It also adds another critical step when defaulting:
()
becomes the first of the the standard list of types tried when doing type defaulting.We can use the verboseCheck
function to see all the test data that QuickCheck is generating for us.
>> verboseCheck (t_idempotent compare)
Notice that we have endless lists of ()
?
Our supposedly reassuring test isn't very useful!
When we're testing a polymorphic property, we need to specify which concrete type we're testing at.
>> import Data.Word (Word8)
>> verboseCheck (t_idempotent (compare :: Word8 -> Word8 -> Ordering))
That's verbose and unhappy-making, right?
Here's one approach to reducing our work:
type Cmp a = a -> a -> Ordering
Trying this at the ghci
prompt:
>> import Data.Word (Word8)
>> verboseCheck (t_idempotent (compare :: Cmp Word8))
Here's an alternative approach:
t_witnessed p a xs = mergeSort p (mergeSort p xs) == mergeSort p xs
where _witness = a < head xs
What's that _witness
variable for?
It's a type witness, a value that exists to express a constraint between several types (it "witnesses" the constraint).
Thanks to the use of <
, this witness forces the type of a
and the type of the elements of xs
to be the same.
(We prefix the name with an underscore to tell GHC that it's an unused wild card.)
We can supply a value for a
of the appropriate type to test over:
>> verboseCheck (t_witnessed compare 'a')
Of course, the value of a
is never used.
As a result, we don't even need to supply a working value, provided the type of what we supply is correct:
>> verboseCheck (t_witnessed compare (undefined::Double))
To generate random values of some type, we must write an Arbitrary
instance for it.
class Arbitrary a where
arbitrary :: Gen a
Here's an example, making use of the fact that this unknown type Gen
is an instance of Monad
:
data Point = Point Int Int
instance Arbitrary Point where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (Point x y)
We're hopefully by now familiar with the Functor
class:
class Functor f where
fmap :: (a -> b) -> f a -> f b
This takes a pure function and "lifts" it into the functor f
.
In general, "lifting" takes a concept and transforms it to work in a different (sometimes more general) setting.
For instance, we can define lifting of functions with the Monad
class too:
liftM :: (Monad m) => (a -> b) -> m a -> m b
liftM f action = do
b <- action
return (f b)
Notice the similarities between the type signatures:
fmap :: (Functor f) => (a -> b) -> f a -> f b
liftM :: (Monad m) => (a -> b) -> m a -> m b
All instances of Monad
can possibly be instances of Functor
. Ideally, they'd be defined in terms of each other:
class (Functor m) => Monad m where
{- blah blah -}
For historical reasons, the two classes are separate, so we write separate instances for them and just reuse the code:
instance Monad MyThingy where
{- whatever -}
instance Functor MyThingy where
fmap = liftM
It turns out that lifting pure functions into monads is really common.
So common, in fact, that Control.Monad
defines a bunch of extra combinators for us.
liftM2 :: (Monad m) => (a -> b -> c) -> m a -> m b -> m b
liftM2 f act1 act2 = do
a <- act1
b <- act2
return (f a b)
These combinators go all the way up to liftM5
.
Look familiar? Useful?
Before:
instance Arbitrary Point where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (Point x y)
After:
import Control.Monad (liftM2)
instance Arbitrary Point where
arbitrary = liftM2 Point arbitrary arbitrary
Suppose we want to verify that the sum of two odd integers is always even.
Here's a quick and obvious property:
p_sum_odd a b
| odd a && odd b = even (a + b)
| otherwise = True
This looks a little crufty, though.
True
even in instances where the property really isn't defined.It would be nice if we could express the idea "check this property only if the inputs satisfy these constraints".
In fact, there's a combinator for that: ==>
p_sum_odd1 a b =
odd a && odd b ==>
even (a+b)
This specifies that the property on the right should hold whenever the Bool
-valued test on the left succeeds.
QuickCheck will discard inputs for which the test fails.
Suppose we try to test this property:
p_odd3 a b c = odd a && odd b && odd c ==> odd (a+b+c)
We'll get a strange error from QuickCheck:
>> quickCheck p_odd
*** Gave up! Passed only 79 tests.
None of our tests failed, but QuickCheck puts an upper limit on the number of test cases it will generate.
This avoids an infinite loop if the condition before the property never holds.
Here, we hit the limit because almost 90% of our inputs are discarded. Why are they discarded?
Instead of filtering out data that isn't right for us, it's better to generate only data that is right.
newtype Odd a = Odd a
deriving (Show)
instance (Integral a, Arbitrary a) => Arbitrary (Odd a) where
arbitrary = do
a <- arbitrary
return $! Odd (if even a then a + 1 else a)
It's clear from inspection that the Arbitrary
instance for Odd a
will only generate odd-valued integers.
We no longer need a conditional property now, since all of our property's parameters must by necessity be odd:
p_odd1 x (Odd a) (Odd b) (Odd c) = odd (a+b+c)
where _witness = x == a
Let's run it:
>> quickCheck $ p_odd1 (1::Int)
+++ OK, passed 100 tests.
Sweet! We saved ourselves from generating a ton of useless data, made the testing process faster, and cleaned up the property some too.
Test data generators have an implicit size parameter, hidden inside the Gen
type.
QuickCheck starts by generating small test cases; it increases the size as testing progresses.
The meaning of "size" is specific to the needs of an Arbitrary
instance.
Arbitrary
instance for lists interprets it as "the maximum length of a list of arbitrary values".We can find the current size using the sized
function, and modify it locally using resize
:
sized :: (Int -> Gen a) -> Gen a
resize :: Int -> Gen a -> Gen a
Suppose we have a tree type:
data Tree a = Node (Tree a) (Tree a)
| Leaf a
deriving (Show)
Here's an obvious Arbitrary
instance:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = oneof [
liftM Leaf arbitrary
, liftM2 Node arbitrary arbitrary
]
The oneof
combinator chooses a generator at random.
oneof :: [Gen a] -> Gen a
Potential trouble:
This generator may not terminate at all!
It's likely to produce huge trees.
We can use the sample
function to generate and print some arbitrary data.
sample :: (Show a) => Gen a -> IO ()
This helps us to explore what's going on.
Here's where the sizing mechanism comes to the rescue.
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = sized tree
tree :: (Arbitrary a) => Int -> Gen (Tree a)
tree 0 = liftM Leaf arbitrary
tree n = oneof [
liftM Leaf arbitrary
, liftM2 Node subtree subtree
]
where subtree = tree (n `div` 2)
A very desirable property of a sorting algorithm is stability.
The built-in sort
and sortBy
functions are designed to be stable.
What about our mergeSort
? Let's find out.
import Data.Function (on)
import Data.List (sortBy)
import Data.Word (Word8)
p_stable xs = merged == sorted
where merged = mergeSort (compare `on` fst) xs
sorted = sortBy (compare `on` fst) xs
_witness = xs :: [(Word8, Word8)]
That on
function is pretty neat:
on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
(*) `on` f = \x y -> f x * f y
Providence Salumu