Property-Based Testing
Typeclasses + Monads = Automatic Testing
Lets look at QuickCheck
- Developed by Koen Claessen and John Hughes in 2000
Ported to 40 other languages
PBT used in basically all kinds of software…
Outline
Properties
Random Test Generation
Case-Study
Property-based Testing
Don’t (only) write individual unit-tests
- a unit test only checks function behavior on a single input
Do write properties that the function must satisfy
- generate random inputs
- run function
- verify (or rather, try to falsify) the property
Testing with Specifications
PBT emphasizes the importance of specifying correct behavior
Makes you think about what the code should do
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.
Properties
A property is a function that returns a Bool
For example, the following property:
prop_revrev :: [Int] -> Bool
=
prop_revrev xs reverse (reverse xs) == xs
states that:
- for any integer list
xs
, - reversing it twice should give the same list
By convention, we use the prefix "prop_"
to name properties.
Checking Properties
Once we have written a property, we can quickCheck
it!
>>> quickCheck prop_revrev
+++ OK, passed 100 tests.
What just happened?
- QC has generated 100 random lists of integers
- it has evaluated
prop_revrev
on each of them - it has checked that it returns
True
for all of them
As good as 100 unit tests!
We can run more tests by specifying that as a parameter
= quickCheckWith (stdArgs {maxSuccess = n})
quickCheckN n
10000 prop_revapp
quickCheckN +++ OK, passed 10000 tests
QUIZ
Consider the property prop_revapp
prop_revapp :: [Int] -> [Int] -> Bool
=
prop_revapp xs ys reverse (xs ++ ys) == reverse xs ++ reverse ys
It says forall xs
, ys
. reverse (xs ++ ys) == reverse xs ++ reverse ys
.
Is the property true?
A. Yes
B. No
Let’s Check!
>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
0]
[1] [
QC gives us a counterexample with xs = [0]
and ys == [1]
reverse (xs ++ ys)
==> reverse ([0] ++ [1])
==> reverse ([0, 1])
==> [1, 0]
but
reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
Fixing The Property
Can you modify prop_revapp
to make it true ?
prop_revapp :: [Int] -> [Int] -> Bool
= reverse (xs ++ ys) == ??? prop_revapp xs ys
So that we get
>>> quickCheck prop_revapp
+++ OK, passed 100 tests.
Lesson learned: sometimes the bug is in the property!
Example: QuickSort
Recall the quickSort
function from earlier in the course:
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
Seems to work?
>>> [1,3..19] ++ [2,4..20]
1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
[
>>> qsort ([1,3..19] ++ [2,4..20])
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] [
Properties of Sorting
What are some properties that qsort
should satisfy?
For example:
- Output is sorted
- (Special case of 1) First element of the output is the smallest
- Output has the same length as the input
- Output has the same set of elements as the input
- …
Let’s start with property 2, seems easy to write!
QUIZ
Lets check that the first element of the output is the smallest
prop_qsort_min :: [Int] -> Bool
= head (qsort xs) == minimum xs prop_qsort_min xs
What is the result of
>>> quickCheck prop_qsort_min
A. Pass 100 tests
B. Fail
Properties and Assumptions
prop_qsort_min :: [a] -> Bool
= head (qsort xs) == minimum x prop_qsort_min xs
Oops!
>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]
prop_qsort_min
is not true for all Int
lists
- Property only makes sense if
xs
is not empty
We must clarity the assumptions under which the code is supposed to work
- also called preconditions
Conditional Properties
Lets modify prop_qsort_min
so equality holds if input is non-null
prop_qsort_nn_min :: [Int] -> Property
=
prop_qsort_nn_min xs not (null xs) ==> head (qsort xs) == minimum xs
Instead of Bool
the function’s output is Property
- a special type built into the QC library
- provides implies operator
==>
(among others)
quickCheck prop_qsort_min +++ OK, passed 100 tests; 12 discarded.
Allows us to differentiate between passing and invalid tests
- the latter are discarded
EXERCISE: Checking Sortedness
Write a property prop_qsort_isOrdered
that checks that the output of qsort
is sorted
Bad idea
prop_qsort_isOrdered1 :: [Int] -> Int -> Int -> Property
=
prop_qsort_isOrdered1 xs i j 0 <= i && i < j && j < length out ==> out !! i <= out !! j
where
= qsort xs
out
quickCheck prop_qsort_isOrdered1 *** Gave up! Passed only 27 tests; 1000 discarded tests.
Too many inputs will be discarded!
Much better idea
prop_qsort_isOrdered2 :: [Int] -> Bool
= isOrdered (qsort xs)
prop_qsort_isOrdered2 xs where
:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered (x1= True
isOrdered _
>>> quickCheck prop_qsort_isOrdered
+++ OK, passed 100 tests.
QUIZ
Let’s check that the output of qsort
has the same length as the input
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x
rs
prop_qsort_length :: [Int] -> Bool
= length (qsort xs) == length xs prop_qsort_length xs
What is the result of
>>> quickCheck prop_qsort_length
A. OK
after 100 tests
B. Failed! Falsifiable...
Lets Check!
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 5 tests and 2 shrinks):
1,1] [
Oops? What?
>>> qsort [1, 1]
1] [
What happened?
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
Specifying No-Duplicates
We assumed that the input has no duplicates
- Values equal to
x
are thrown out fromls
andrs
Is this a bug? Maybe? Maybe not?
- But at least its something we should be aware of!
Lets specify that a list has no-duplicates
isDistinct :: (Eq a) => [a] -> Bool
:xs) = not (x `elem` xs) && noDuplicates xs
isDistinct (x= True isDistinct _
Specifying a Pre-Condition
Now we can check qsort
preserves length on inputs with no duplicates
prop_qsort_length :: [Int] -> Property
=
prop_qsort_length xs ==> (length (qsort xs) == length xs)
(isDistinct xs)
-- >>> quickCheck prop_qsort_length
-- +++ OK, passed 100 tests; 224 discarded.
Specifying a Post-Condition
We can also check that qsort
outputs a list with no-duplicates
prop_qsort_distinct :: [Int] -> Bool
= noDuplicates (qsort xs)
prop_qsort_distinct xs
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
Properties of Sorting
- Output is sorted
- (Special case of 1) First element of the output is the smallest
- Output has the same length as the input
- Output has the same set of elements as the input
- …
Have we missed anything?
An alternative is to test that our qsort
is identical to a trusted reference implementation
- may be too slow to deploy but ok to use for checking correctness
Lets use the standard library’s Data.List.sort
function
- (Much faster than ours… but just for illustration!)
prop_qsort_sort :: [Int] -> Bool
= qsort xs == sort xs prop_qsort_sort xs
Outline
Properties [done]
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
Random Test Generation
Case-Study
Test Generation
Lets notice something about quickCheck
If you run it once …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
5,5] [
and if you run it again …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
1,1] [
The falsifying tests are different!
How is this possible?
Generators
QC defines a special Generator data type
data Gen a = MkGen (StdGen -> Int -> a)
A Gen a
is a function that takes as input
- a random number generator
StdGen
- a size parameter
Int
(ignore this for now)
and returns as output
- a value of type
a
Basic Generators
The most basic generator is created by the QC function:
choose :: (Int, Int) -> Gen Int
which
takes a pair of
(lo, hi)
returns a random generator for values between
lo
andhi
Running Generators
To execute the Gen
we need access to the system’s “randomness”
Done via IO
“recipes”
-- | Generate a single random value
generate :: Gen a -> IO a
-- | Generate a list of random values (of some default length)
sample' :: Gen a -> IO [a]
Which
- takes a Generator of
a
values and - returns a recipe that produces a list of (11)
a
values
We can run the recipe in ghci
>>> sample' (choose (1, 10))
9,2,5,6,1,6,2,6,1,3,10] [
Combining Generators
How do we create more complex generators?
Suppose I have a generator
digit :: Gen Int
= choose (0, 9) digit
How can I create a generator of a pair of digits?
digitPair :: Gen (Int, Int)
Generator is a Monad!
You can see details here
This will let us combine generators (like combining parsers…)
QUIZ
Which of the below implements digitPair :: Gen (Int, Int)
?
- given
digit :: Gen Int
-- A
= (digit, digit)
digitPair
-- B
= Gen digit digit
digitPair
-- C
= do { x1 <- digit; x2 <- digit; return (x1, x2) }
digitPair
-- D
= do { x <- digit; return (x, x) }
digitPair
-- E
= (generate digit, generate digit) digitPair
= do { x1 <- digit; x2 <- digit; return (x1, x2) }
digitPair
-- >>> sample' digitPair
-- [(9,1),(9,2),(7,7),(8,9),(3,4),(0,6),(7,4),(7,4),(8,2),(2,9),(2,9)]
EXERCISE
Lets write a function that
- takes as input a non-empty list
- returns a generator that produces a random element of the list
fromList :: [a] -> Gen a
= ??? fromList
Generator Combinators
QC provides several combinators for creating generators:
oneOf :: [Gen a] -> Gen a
randomly selects one of
g0
,…gn
randomly generates a value from the chosen generator
>>> sample' (oneOf [choose (0,2), choose (10,12)])
2,2,1,1,12,10,2,2,11,0,11] [
oneof
is generalized into the frequency
combinator
frequency :: [(Int, Gen a)] -> Gen a
which builds weighted mixtures of individual generators
EXERCISE
Write a generator for lists, given a generator for elements
listGen :: Gen a -> Gen [a]
= ??? listGen
Using Generators
Using our listGen
is a little inconvenient
-- Always need to specify the generator for the elements!
listGen digit
listGen (listGen digit)
-- mapGen is a hypothetical generator for maps
100, 200))) listGen (mapGen digit (choose (
How can we make QC pick a default generator based on the type of the elements?
Arbitrary Type Class
QC has a class
for types whose values can be randomly Generated
class Arbitrary a where
arbitrary :: Gen a
T
is an instance of Arbitrary
if there is a generator of T
s!
Now we could define
instance Arbitrary a => Arbitrary [a] where
= listGen arbitrary arbitrary
and use it like so:
>>> sample' (arbitrary :: Gen [Int])
-2,-4,-2,-4],[3,5,6,-6,3,-4],[6,-5,-3],...] [[],[
Many standard types (including lists) already have Arbitrary
instances:
>>> sample' (arbitrary :: Gen Int)
0,-2,-2,0,-1,8,1,-14,-13,5,19]
[
>>> sample' (arbitrary :: Gen (Int, Bool))
0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
[(-
>>> sample' (arbitrary :: Gen String)
"","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"] [
Outline
Property-based Testing [done]
Random Test Generation [done]
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
Case-Study
Case Study: Compiler Optimizations
Lets use QC to test compiler optimizations
Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)
Using the WHILE
language from your HW assignment.
WHILE: Syntax
Recall the definition of Variable
and Value
data Variable = V String
data Value = IntVal Int | BoolVal Bool
which we used to define Expression
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Lt Expression Expression
and Statement
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
WHILE: Semantics
Next, we defined the behavior of programs as
functions from starting store to final store
where store was defined as a map from
Variable
toValue
type Store = M.Map Variable Value
and then you wrote a function
execute :: Store -> Statement -> Store
= execState (evalS stmt) s0 execute s0 stmt
(We can skip the details of evalS
because you wrote it… right?)
Generating WHILE Programs
Lets write a program generator
Generating Variables
instance Arbitrary Variable where
= do
arbitrary <- elements ['A'..'Z']
x return (V [x])
and we get
>>> sample' (arbitrary :: Gen Variable)
C,J,Y,K,Q,T,Q,K,P,B,W] [
Generating Values
instance Arbitrary Value where
= oneOf [ do {n <- arbitrary; return (IntVal n) }
arbitrary do {b <- arbitrary; return (BoolVal b)}
, ]
and we get
>>> sample' (arbitrary :: Gen Value)
False,2,True,False,True,-4,True,-12,False,10,True] [
Aside: fmap
This is a very common pattern:
do {x <- step; return (f x) }
Luckily, every Monad
is also a Functor
, and therefore defines:
fmap :: (a -> b) -> m a -> m b
So we can write the above as:
`fmap` step f
or with <$>
, the infix synonym for fmap
:
<$> step f
EXERCISE: Generating Expressions
Implement an Arbitrary
instance for Expression
- Note: expressions need not be well-typed!
instance Arbitrary Expression where
= expr
arbitrary
expr :: Gen Expression
= ??? expr
atom :: Gen Expression
expr, binary,= oneof [atom, binary]
expr
= do o <- elements [Plus, Lt]
binary <- expr
e1 <- expr
e2 return (o e1 e2)
= oneof [ Var <$> arbitrary, Val <$> arbitrary ] atom
which gives us
>>> sample (arbitrary :: Gen Expression)
0 < 0 + False) < (E < False)) < 0) < V)
((((-1 + I + True
True + False + 2
-1
False
J < True + True) + (V + True + A < D) + P + S + -9 + K + U + 2 + R + C
(X
...
Generating States
QC already has an way to automatically generate Map
s
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
= do {kvs <- arbitrary; return (M.fromList kvs) } arbitrary
So for free we get a generator for Store
>>> sample (arbitrary :: Gen Store)
...
Specification: Program Equivalence
We are interested in the following property:
(=~) :: Statement -> Statement -> Property
=~ p2 = forAll arbitrary (\st -> execute st p1 == execute st p2) p1
That is, for all input stores st
executing
p1
fromst
andexecuting
p2
fromst
produce the same store
QUIZ
-- X := 10; Y := X
= Sequence
prog1 Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Var (V "X")))
(
-- Y := 10; X := Y
= Sequence
prog2 Assign (V "Y") (Val (IntVal 10)))
(Assign (V "X") (Var (V "Y"))) (
What is the result of the following query?
>>> quickCheck (prog1 === prog2)
A. PASS
B. FAIL
QUIZ
-- X := Y; Y := X
= Sequence
prog3 Assign (V "X") (Var (V "Y")))
(Assign (V "Y") (Var (V "X")))
(
-- Y := X; X := Y
= Sequence
prog4 Assign (V "Y") (Var (V "X")))
(Assign (V "X") (Var (V "Y"))) (
What is the result of the following query?
>>> quickCheck (prog3 === prog4)
A. PASS
B. FAIL
Checking Compiler Optimizations
A compiler optimization can be viewed as a pair of programs
- input written by human
p_in
- output optimized by compiler
p_out
An optimization is correct if p_in =~ p_out
- Compiler should not change behavior of the code
Optimization 1: Zero-Add-Elimination
Here’s a simple optimization
In | Out |
---|---|
X := E + 0 |
X := E |
Should be correct because adding 0
doesn’t change anything…
We can write this as a QC property
prop_add_zero_elim :: Variable -> Expression -> Property
=
prop_add_zero_elim x e `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e) (x
So what does QC say?
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []
What’s that? Lets see is W := True
equivalent to W := True + 0
?
Forgot about those pesky boolean expressions!
If you recall W := True
will just assign True
to W
>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
W,True)] fromList [(
but W := True + 0
will have a “type error” so will assign 0
to W
!
>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
W,0)] fromList [(
Fix: Restrict to Int Expressions
Problem was expressions like True
- Caused strange behaviors due to “type errors”
Lets avoid generating expressions with True
, False
, and Lt
:
atomI :: Gen Expression
exprI, binaryI,= oneof [atomI, binaryI]
exprI
= do e1 <- exprI
binaryI <- exprI
e2 return (Plus e1 e2)
= oneof [ Var <$> arbitrary, Val . IntVal <$> arbitrary ] atomI
Now we can restrict the property to
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e)) forAll intExpr (\e
QUIZ
Consider the property
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0)))
forAll intExpr (\e =~
`Assign` e)) (x
What will be the result of
>>> quickCheck prop_add_zero_elim'
A. PASS
B. FAIL
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)] fromList [(
Oops, the counterexample is Z := G
and Z := G + 0
- but now the starting state maps
(G, True)
Pesky Bool
sneaked right back in …
Moral: Even simple optimizations are really tricky without types!
Try at home Can you fix this property so it passes?
Optimization 2: Constant Folding
Lets try another optimization that doesn’t use any arithmetic
- So should not have any problems with
Int
-vs-Bool
Suppose you have two back-to-back assignments
X := E; Y := E
Why recompute E
? Result is already stored in X
! So optimize the above to
X := E; Y := X
We can specify this transformation as the QC property
prop_const_prop :: Variable -> Variable -> Expression -> Property
=
prop_const_prop x y e `Assign` e) `Sequence` (y `Assign` e))
((x =~
`Assign` e) `Sequence` (y `Assign` Var x)) ((x
Mighty QC, do you agree ?
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)] fromList [(
Shrinking Tests
The property fails … but the counterexample is very long!
QC has a test shrinking mechanism … also in the Arbitrary
class
shrink :: a -> [a]
shrink t
- takes as input a candidate
t
and
- returns as output a list of smaller candidates
That QC will systematically search with to find a minimally failing test!
instance Arbitrary Expression where
= expr
arbitrary
-- shrink :: Expression -> [Expression]
Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink (= [] shrink _
We should just keep shrinking each Expression
to its sub-Expressions
.
>>> quickCheck prop_const_prop
*** Failed! Falsifiable (after 26 tests and 4 shrinks):
D
U
A + D
D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)] fromList [(
Aha! Consider the two programs
D := A + D;
U := A + D
and
D := A + D;
U := D
are they equivalent? Pretty subtle, eh.
Recap: Property-Based Testing
- Property-based Testing
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
- Test Generation
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
- Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers, databases, and more!
Recap: Property-Based Testing
QuickCheck is awesome!
Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.
Lots of literature on QC and techniques it has inspired
- Can even use QC to generate test data systems in other languages