Property-Based Testing
Typeclasses + Monads = Automatic Testing
Lets look at QuickCheck
- Developed by Koen Claessen and John Hughes in 2000
Ported to 40 other languages
PBT used in basically all kinds of software…
Outline
Properties
Random Test Generation
Case-Study
Property-based Testing
Don’t (only) write individual unit-tests
- a unit test only checks function behavior on a single input
Do write properties that the function must satisfy
- generate random inputs
- run function
- verify (or rather, try to falsify) the property
Testing with Specifications
PBT emphasizes the importance of specifying correct behavior
Makes you think about what the code should do
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.
Properties
A property is a function that returns a Bool
For example, the following property:
prop_revrev :: [Int] -> Bool
prop_revrev xs =
reverse (reverse xs) == xsstates that:
- for any integer list
xs, - reversing it twice should give the same list
By convention, we use the prefix "prop_" to name properties.
Checking Properties
Once we have written a property, we can quickCheck it!
>>> quickCheck prop_revrev
+++ OK, passed 100 tests.What just happened?
- QC has generated 100 random lists of integers
- it has evaluated
prop_revrevon each of them - it has checked that it returns
Truefor all of them
As good as 100 unit tests!
We can run more tests by specifying that as a parameter
quickCheckN n = quickCheckWith (stdArgs {maxSuccess = n})
quickCheckN 10000 prop_revapp
+++ OK, passed 10000 tests
QUIZ
Consider the property prop_revapp
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys =
reverse (xs ++ ys) == reverse xs ++ reverse ysIt says forall xs, ys. reverse (xs ++ ys) == reverse xs ++ reverse ys.
Is the property true?
A. Yes
B. No
Let’s Check!
>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
[0]
[1]QC gives us a counterexample with xs = [0] and ys == [1]
reverse (xs ++ ys)
==> reverse ([0] ++ [1])
==> reverse ([0, 1])
==> [1, 0]but
reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
Fixing The Property
Can you modify prop_revapp to make it true ?
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs ++ ys) == ???So that we get
>>> quickCheck prop_revapp
+++ OK, passed 100 tests.Lesson learned: sometimes the bug is in the property!
Example: QuickSort
Recall the quickSort function from earlier in the course:
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > xSeems to work?
>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
Properties of Sorting
What are some properties that qsort should satisfy?
For example:
- Output is sorted
- (Special case of 1) First element of the output is the smallest
- Output has the same length as the input
- Output has the same set of elements as the input
- …
Let’s start with property 2, seems easy to write!
QUIZ
Lets check that the first element of the output is the smallest
prop_qsort_min :: [Int] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xsWhat is the result of
>>> quickCheck prop_qsort_minA. Pass 100 tests
B. Fail
Properties and Assumptions
prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xOops!
>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]prop_qsort_min is not true for all Int lists
- Property only makes sense if
xsis not empty
We must clarity the assumptions under which the code is supposed to work
- also called preconditions
Conditional Properties
Lets modify prop_qsort_min so equality holds if input is non-null
prop_qsort_nn_min :: [Int] -> Property
prop_qsort_nn_min xs =
not (null xs) ==> head (qsort xs) == minimum xsInstead of Bool the function’s output is Property
- a special type built into the QC library
- provides implies operator
==>(among others)
quickCheck prop_qsort_min
+++ OK, passed 100 tests; 12 discarded.Allows us to differentiate between passing and invalid tests
- the latter are discarded
EXERCISE: Checking Sortedness
Write a property prop_qsort_isOrdered that checks that the output of qsort is sorted
Bad idea
prop_qsort_isOrdered1 :: [Int] -> Int -> Int -> Property
prop_qsort_isOrdered1 xs i j =
0 <= i && i < j && j < length out ==> out !! i <= out !! j
where
out = qsort xs
quickCheck prop_qsort_isOrdered1
*** Gave up! Passed only 27 tests; 1000 discarded tests. Too many inputs will be discarded!
Much better idea
prop_qsort_isOrdered2 :: [Int] -> Bool
prop_qsort_isOrdered2 xs = isOrdered (qsort xs)
where
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _ = True
>>> quickCheck prop_qsort_isOrdered
+++ OK, passed 100 tests.
QUIZ
Let’s check that the output of qsort has the same length as the input
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
prop_qsort_length :: [Int] -> Bool
prop_qsort_length xs = length (qsort xs) == length xs What is the result of
>>> quickCheck prop_qsort_lengthA. OK after 100 tests
B. Failed! Falsifiable...
Lets Check!
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 5 tests and 2 shrinks):
[1,1]Oops? What?
>>> qsort [1, 1]
[1]What happened?
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
Specifying No-Duplicates
We assumed that the input has no duplicates
- Values equal to
xare thrown out fromlsandrs
Is this a bug? Maybe? Maybe not?
- But at least its something we should be aware of!
Lets specify that a list has no-duplicates
isDistinct :: (Eq a) => [a] -> Bool
isDistinct (x:xs) = not (x `elem` xs) && noDuplicates xs
isDistinct _ = True
Specifying a Pre-Condition
Now we can check qsort preserves length on inputs with no duplicates
prop_qsort_length :: [Int] -> Property
prop_qsort_length xs =
(isDistinct xs) ==> (length (qsort xs) == length xs)
-- >>> quickCheck prop_qsort_length
-- +++ OK, passed 100 tests; 224 discarded.
Specifying a Post-Condition
We can also check that qsort outputs a list with no-duplicates
prop_qsort_distinct :: [Int] -> Bool
prop_qsort_distinct xs = noDuplicates (qsort xs)
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
Properties of Sorting
- Output is sorted
- (Special case of 1) First element of the output is the smallest
- Output has the same length as the input
- Output has the same set of elements as the input
- …
Have we missed anything?
An alternative is to test that our qsort is identical to a trusted reference implementation
- may be too slow to deploy but ok to use for checking correctness
Lets use the standard library’s Data.List.sort function
- (Much faster than ours… but just for illustration!)
prop_qsort_sort :: [Int] -> Bool
prop_qsort_sort xs = qsort xs == sort xs
Outline
Properties [done]
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
Random Test Generation
Case-Study
Test Generation
Lets notice something about quickCheck
If you run it once …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
[5,5]and if you run it again …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
[1,1]The falsifying tests are different!
How is this possible?
Generators
QC defines a special Generator data type
data Gen a = MkGen (StdGen -> Int -> a)A Gen a is a function that takes as input
- a random number generator
StdGen - a size parameter
Int(ignore this for now)
and returns as output
- a value of type
a
Basic Generators
The most basic generator is created by the QC function:
choose :: (Int, Int) -> Gen Intwhich
takes a pair of
(lo, hi)returns a random generator for values between
loandhi
Running Generators
To execute the Gen we need access to the system’s “randomness”
Done via IO “recipes”
-- | Generate a single random value
generate :: Gen a -> IO a
-- | Generate a list of random values (of some default length)
sample' :: Gen a -> IO [a]Which
- takes a Generator of
avalues and - returns a recipe that produces a list of (11)
avalues
We can run the recipe in ghci
>>> sample' (choose (1, 10))
[9,2,5,6,1,6,2,6,1,3,10]
Combining Generators
How do we create more complex generators?
Suppose I have a generator
digit :: Gen Int
digit = choose (0, 9)How can I create a generator of a pair of digits?
digitPair :: Gen (Int, Int)
Generator is a Monad!
You can see details here
This will let us combine generators (like combining parsers…)
QUIZ
Which of the below implements digitPair :: Gen (Int, Int) ?
- given
digit :: Gen Int
-- A
digitPair = (digit, digit)
-- B
digitPair = Gen digit digit
-- C
digitPair = do { x1 <- digit; x2 <- digit; return (x1, x2) }
-- D
digitPair = do { x <- digit; return (x, x) }
-- E
digitPair = (generate digit, generate digit)
digitPair = do { x1 <- digit; x2 <- digit; return (x1, x2) }
-- >>> sample' digitPair
-- [(9,1),(9,2),(7,7),(8,9),(3,4),(0,6),(7,4),(7,4),(8,2),(2,9),(2,9)]
EXERCISE
Lets write a function that
- takes as input a non-empty list
- returns a generator that produces a random element of the list
fromList :: [a] -> Gen a
fromList = ???
Generator Combinators
QC provides several combinators for creating generators:
oneOf :: [Gen a] -> Gen arandomly selects one of
g0,…gnrandomly generates a value from the chosen generator
>>> sample' (oneOf [choose (0,2), choose (10,12)])
[2,2,1,1,12,10,2,2,11,0,11]
oneof is generalized into the frequency combinator
frequency :: [(Int, Gen a)] -> Gen awhich builds weighted mixtures of individual generators
EXERCISE
Write a generator for lists, given a generator for elements
listGen :: Gen a -> Gen [a]
listGen = ???
Using Generators
Using our listGen is a little inconvenient
-- Always need to specify the generator for the elements!
listGen digit
listGen (listGen digit)
-- mapGen is a hypothetical generator for maps
listGen (mapGen digit (choose (100, 200)))How can we make QC pick a default generator based on the type of the elements?
Arbitrary Type Class
QC has a class for types whose values can be randomly Generated
class Arbitrary a where
arbitrary :: Gen aT is an instance of Arbitrary if there is a generator of Ts!
Now we could define
instance Arbitrary a => Arbitrary [a] where
arbitrary = listGen arbitraryand use it like so:
>>> sample' (arbitrary :: Gen [Int])
[[],[-2,-4,-2,-4],[3,5,6,-6,3,-4],[6,-5,-3],...]
Many standard types (including lists) already have Arbitrary instances:
>>> sample' (arbitrary :: Gen Int)
[0,-2,-2,0,-1,8,1,-14,-13,5,19]
>>> sample' (arbitrary :: Gen (Int, Bool))
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> sample' (arbitrary :: Gen String)
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]
Outline
Property-based Testing [done]
Random Test Generation [done]
Gen ais a monadic generator ofavaluesArbitraryis a class for types with generators
Case-Study
Case Study: Compiler Optimizations
Lets use QC to test compiler optimizations
Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)
Using the WHILE language from your HW assignment.
WHILE: Syntax
Recall the definition of Variable and Value
data Variable = V String
data Value = IntVal Int | BoolVal Boolwhich we used to define Expression
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Lt Expression Expressionand Statement
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
WHILE: Semantics
Next, we defined the behavior of programs as
functions from starting store to final store
where store was defined as a map from
VariabletoValue
type Store = M.Map Variable Valueand then you wrote a function
execute :: Store -> Statement -> Store
execute s0 stmt = execState (evalS stmt) s0(We can skip the details of evalS because you wrote it… right?)
Generating WHILE Programs
Lets write a program generator
Generating Variables
instance Arbitrary Variable where
arbitrary = do
x <- elements ['A'..'Z']
return (V [x])and we get
>>> sample' (arbitrary :: Gen Variable)
[C,J,Y,K,Q,T,Q,K,P,B,W]
Generating Values
instance Arbitrary Value where
arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
, do {b <- arbitrary; return (BoolVal b)}
]and we get
>>> sample' (arbitrary :: Gen Value)
[False,2,True,False,True,-4,True,-12,False,10,True]
Aside: fmap
This is a very common pattern:
do {x <- step; return (f x) }Luckily, every Monad is also a Functor, and therefore defines:
fmap :: (a -> b) -> m a -> m bSo we can write the above as:
f `fmap` stepor with <$>, the infix synonym for fmap:
f <$> step
EXERCISE: Generating Expressions
Implement an Arbitrary instance for Expression
- Note: expressions need not be well-typed!
instance Arbitrary Expression where
arbitrary = expr
expr :: Gen Expression
expr = ???
expr, binary, atom :: Gen Expression
expr = oneof [atom, binary]
binary = do o <- elements [Plus, Lt]
e1 <- expr
e2 <- expr
return (o e1 e2)
atom = oneof [ Var <$> arbitrary, Val <$> arbitrary ]which gives us
>>> sample (arbitrary :: Gen Expression)
((((0 < 0 + False) < (E < False)) < 0) < V)
-1 + I + True
True + False + 2
-1
False
(J < True + True) + (V + True + A < D) + P + S + -9 + K + U + 2 + R + C
X
...
Generating States
QC already has an way to automatically generate Maps
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }So for free we get a generator for Store
>>> sample (arbitrary :: Gen Store)
...
Specification: Program Equivalence
We are interested in the following property:
(=~) :: Statement -> Statement -> Property
p1 =~ p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)That is, for all input stores st
executing
p1fromstandexecuting
p2fromst
produce the same store
QUIZ
-- X := 10; Y := X
prog1 = Sequence
(Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Var (V "X")))
-- Y := 10; X := Y
prog2 = Sequence
(Assign (V "Y") (Val (IntVal 10)))
(Assign (V "X") (Var (V "Y")))What is the result of the following query?
>>> quickCheck (prog1 === prog2)A. PASS
B. FAIL
QUIZ
-- X := Y; Y := X
prog3 = Sequence
(Assign (V "X") (Var (V "Y")))
(Assign (V "Y") (Var (V "X")))
-- Y := X; X := Y
prog4 = Sequence
(Assign (V "Y") (Var (V "X")))
(Assign (V "X") (Var (V "Y")))What is the result of the following query?
>>> quickCheck (prog3 === prog4)A. PASS
B. FAIL
Checking Compiler Optimizations
A compiler optimization can be viewed as a pair of programs
- input written by human
p_in - output optimized by compiler
p_out
An optimization is correct if p_in =~ p_out
- Compiler should not change behavior of the code
Optimization 1: Zero-Add-Elimination
Here’s a simple optimization
| In | Out |
|---|---|
X := E + 0 |
X := E |
Should be correct because adding 0 doesn’t change anything…
We can write this as a QC property
prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e =
(x `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e) So what does QC say?
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []What’s that? Lets see is W := True equivalent to W := True + 0 ?
Forgot about those pesky boolean expressions!
If you recall W := True will just assign True to W
>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
fromList [(W,True)]but W := True + 0 will have a “type error” so will assign 0 to W!
>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
fromList [(W,0)]
Fix: Restrict to Int Expressions
Problem was expressions like True
- Caused strange behaviors due to “type errors”
Lets avoid generating expressions with True, False, and Lt:
exprI, binaryI, atomI :: Gen Expression
exprI = oneof [atomI, binaryI]
binaryI = do e1 <- exprI
e2 <- exprI
return (Plus e1 e2)
atomI = oneof [ Var <$> arbitrary, Val . IntVal <$> arbitrary ]Now we can restrict the property to
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e))
QUIZ
Consider the property
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0)))
=~
(x `Assign` e))What will be the result of
>>> quickCheck prop_add_zero_elim'A. PASS
B. FAIL
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]Oops, the counterexample is Z := G and Z := G + 0
- but now the starting state maps
(G, True)
Pesky Bool sneaked right back in …
Moral: Even simple optimizations are really tricky without types!
Try at home Can you fix this property so it passes?
Optimization 2: Constant Folding
Lets try another optimization that doesn’t use any arithmetic
- So should not have any problems with
Int-vs-Bool
Suppose you have two back-to-back assignments
X := E; Y := EWhy recompute E? Result is already stored in X! So optimize the above to
X := E; Y := XWe can specify this transformation as the QC property
prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e =
((x `Assign` e) `Sequence` (y `Assign` e))
=~
((x `Assign` e) `Sequence` (y `Assign` Var x))Mighty QC, do you agree ?
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]
Shrinking Tests
The property fails … but the counterexample is very long!

QC has a test shrinking mechanism … also in the Arbitrary class
shrink :: a -> [a]shrink t
- takes as input a candidate
tand
- returns as output a list of smaller candidates
That QC will systematically search with to find a minimally failing test!
instance Arbitrary Expression where
arbitrary = expr
-- shrink :: Expression -> [Expression]
shrink (Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink _ = []We should just keep shrinking each Expression to its sub-Expressions.
>>> quickCheck prop_const_prop
*** Failed! Falsifiable (after 26 tests and 4 shrinks):
D
U
A + D
fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]Aha! Consider the two programs
D := A + D;
U := A + Dand
D := A + D;
U := Dare they equivalent? Pretty subtle, eh.
Recap: Property-Based Testing
- Property-based Testing
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
- Test Generation
Gen ais a monadic generator ofavaluesArbitraryis a class for types with generators
- Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers, databases, and more!
Recap: Property-Based Testing
QuickCheck is awesome!
Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.
Lots of literature on QC and techniques it has inspired
- Can even use QC to generate test data systems in other languages