cse230

Property-Based Testing

Typeclasses + Monads = Automatic Testing

Lets look at QuickCheck

Developed by Koen Claessen and John Hughes in 2000

Ported to 40 other languages

JSVerify, JUNIT-quickcheck, hypothesis

PBT used in basically all kinds of software…

Outline

Properties
Random Test Generation
Case-Study

Property-based Testing

Don’t (only) write individual unit-tests

a unit test only checks function behavior on a single input

Do write properties that the function must satisfy

generate random inputs
run function
verify (or rather, try to falsify) the property

Testing with Specifications

PBT emphasizes the importance of specifying correct behavior

Makes you think about what the code should do
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.

Properties

A property is a function that returns a Bool

For example, the following property:

prop_revrev :: [Int] -> Bool
prop_revrev xs =
  reverse (reverse xs) == xs

states that:

for any integer list xs,
reversing it twice should give the same list

By convention, we use the prefix "prop_" to name properties.

Checking Properties

Once we have written a property, we can quickCheck it!

>>> quickCheck prop_revrev
+++ OK, passed 100 tests.

What just happened?

QC has generated 100 random lists of integers
it has evaluated prop_revrev on each of them
it has checked that it returns True for all of them

As good as 100 unit tests!

We can run more tests by specifying that as a parameter

quickCheckN n = quickCheckWith (stdArgs {maxSuccess = n})

quickCheckN 10000 prop_revapp
+++ OK, passed 10000 tests

QUIZ

Consider the property prop_revapp

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = 
    reverse (xs ++ ys) == reverse xs ++ reverse ys

It says forall xs, ys. reverse (xs ++ ys) == reverse xs ++ reverse ys.

Is the property true?

A. Yes

B. No

Let’s Check!

>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
[0]
[1]

QC gives us a counterexample with xs = [0] and ys == [1]

reverse (xs ++ ys) 
==> reverse ([0] ++ [1]) 
==> reverse ([0, 1]) 
==> [1, 0]

but

reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]

Fixing The Property

Can you modify prop_revapp to make it true ?

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs ++ ys) == ???

So that we get

>>> quickCheck prop_revapp
+++ OK, passed 100 tests.

Lesson learned: sometimes the bug is in the property!

Example: QuickSort

Recall the quickSort function from earlier in the course:

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

Seems to work?

>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]

>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

Properties of Sorting

What are some properties that qsort should satisfy?

For example:

Output is sorted
(Special case of 1) First element of the output is the smallest
Output has the same length as the input
Output has the same set of elements as the input
…

Let’s start with property 2, seems easy to write!

QUIZ

Lets check that the first element of the output is the smallest

prop_qsort_min :: [Int] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xs

What is the result of

>>> quickCheck prop_qsort_min

A. Pass 100 tests

B. Fail

Properties and Assumptions

prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum x

Oops!

>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]

prop_qsort_min is not true for all Int lists

Property only makes sense if xs is not empty

We must clarity the assumptions under which the code is supposed to work

also called preconditions

Conditional Properties

Lets modify prop_qsort_min so equality holds if input is non-null

prop_qsort_nn_min    :: [Int] -> Property
prop_qsort_nn_min xs =
  not (null xs) ==> head (qsort xs) == minimum xs

Instead of Bool the function’s output is Property

a special type built into the QC library
provides implies operator ==> (among others)

quickCheck prop_qsort_min 
+++ OK, passed 100 tests; 12 discarded.

Allows us to differentiate between passing and invalid tests

the latter are discarded

EXERCISE: Checking Sortedness

Write a property prop_qsort_isOrdered that checks that the output of qsort is sorted

Bad idea

prop_qsort_isOrdered1 :: [Int] -> Int -> Int -> Property
prop_qsort_isOrdered1 xs i j = 
  0 <= i && i < j && j < length out  ==> out !! i  <= out !! j
  where
    out = qsort xs

quickCheck prop_qsort_isOrdered1 
*** Gave up! Passed only 27 tests; 1000 discarded tests.

Too many inputs will be discarded!

Much better idea

prop_qsort_isOrdered2 :: [Int] -> Bool
prop_qsort_isOrdered2 xs = isOrdered (qsort xs)
  where
    isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
    isOrdered _          = True


>>> quickCheck prop_qsort_isOrdered
+++ OK, passed 100 tests.

QUIZ

Let’s check that the output of qsort has the same length as the input

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

prop_qsort_length :: [Int] -> Bool
prop_qsort_length xs = length (qsort xs) == length xs

What is the result of

>>> quickCheck prop_qsort_length

A. OK after 100 tests

B. Failed! Falsifiable...

Lets Check!

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 5 tests and 2 shrinks):
[1,1]

Oops? What?

>>> qsort [1, 1]
[1]

What happened?

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

Specifying No-Duplicates

We assumed that the input has no duplicates

Values equal to x are thrown out from ls and rs

Is this a bug? Maybe? Maybe not?

But at least its something we should be aware of!

Lets specify that a list has no-duplicates

isDistinct :: (Eq a) => [a] -> Bool
isDistinct (x:xs) = not (x `elem` xs) && noDuplicates xs
isDistinct _      = True

Specifying a Pre-Condition

Now we can check qsort preserves length on inputs with no duplicates

prop_qsort_length :: [Int] -> Property 
prop_qsort_length xs = 
  (isDistinct xs) ==> (length (qsort xs) == length xs)

-- >>> quickCheck prop_qsort_length
-- +++ OK, passed 100 tests; 224 discarded.

Specifying a Post-Condition

We can also check that qsort outputs a list with no-duplicates

prop_qsort_distinct :: [Int] -> Bool 
prop_qsort_distinct xs = noDuplicates (qsort xs)  

-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.

Properties of Sorting

Output is sorted
(Special case of 1) First element of the output is the smallest
Output has the same length as the input
Output has the same set of elements as the input
…

Have we missed anything?

An alternative is to test that our qsort is identical to a trusted reference implementation

may be too slow to deploy but ok to use for checking correctness

Lets use the standard library’s Data.List.sort function

(Much faster than ours… but just for illustration!)

prop_qsort_sort    :: [Int] -> Bool
prop_qsort_sort xs =  qsort xs == sort xs

Outline

Properties [done]
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
Random Test Generation
Case-Study

Test Generation

Lets notice something about quickCheck

If you run it once …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
[5,5]

and if you run it again …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
[1,1]

The falsifying tests are different!

How is this possible?

Generators

QC defines a special Generator data type

data Gen a = MkGen (StdGen -> Int -> a)

A Gen a is a function that takes as input

a random number generator StdGen
a size parameter Int (ignore this for now)

and returns as output

a value of type a

Basic Generators

The most basic generator is created by the QC function:

choose :: (Int, Int) -> Gen Int

which

takes a pair of (lo, hi)
returns a random generator for values between lo and hi

Running Generators

To execute the Gen we need access to the system’s “randomness”

Done via IO “recipes”

-- | Generate a single random value
generate :: Gen a -> IO a

-- | Generate a list of random values (of some default length)
sample' :: Gen a -> IO [a]

Which

takes a Generator of a values and
returns a recipe that produces a list of (11) a values

We can run the recipe in ghci

>>> sample' (choose (1, 10))
[9,2,5,6,1,6,2,6,1,3,10]

Combining Generators

How do we create more complex generators?

Suppose I have a generator

digit :: Gen Int
digit = choose (0, 9)

How can I create a generator of a pair of digits?

digitPair :: Gen (Int, Int)

Generator is a Monad!

You can see details here
This will let us combine generators (like combining parsers…)

QUIZ

Which of the below implements digitPair :: Gen (Int, Int) ?

given digit :: Gen Int

-- A
digitPair = (digit, digit)

-- B
digitPair = Gen digit digit

-- C
digitPair = do { x1 <- digit; x2 <- digit; return (x1, x2) }

-- D
digitPair = do { x <- digit; return (x, x) }  

-- E 
digitPair = (generate digit, generate digit)

digitPair = do { x1 <- digit; x2 <- digit; return (x1, x2) }

-- >>> sample' digitPair
-- [(9,1),(9,2),(7,7),(8,9),(3,4),(0,6),(7,4),(7,4),(8,2),(2,9),(2,9)]

EXERCISE

Lets write a function that

takes as input a non-empty list
returns a generator that produces a random element of the list

fromList :: [a] -> Gen a
fromList = ???

Generator Combinators

QC provides several combinators for creating generators:

oneOf :: [Gen a] -> Gen a

randomly selects one of g0,…gn
randomly generates a value from the chosen generator

>>> sample' (oneOf [choose (0,2), choose (10,12)])
[2,2,1,1,12,10,2,2,11,0,11]

oneof is generalized into the frequency combinator

frequency :: [(Int, Gen a)] -> Gen a

which builds weighted mixtures of individual generators

EXERCISE

Write a generator for lists, given a generator for elements

listGen :: Gen a -> Gen [a]
listGen = ???

Using Generators

Using our listGen is a little inconvenient

-- Always need to specify the generator for the elements!
listGen digit

listGen (listGen digit)

-- mapGen is a hypothetical generator for maps
listGen (mapGen digit (choose (100, 200)))

How can we make QC pick a default generator based on the type of the elements?

Arbitrary Type Class

QC has a class for types whose values can be randomly Generated

class Arbitrary a where
  arbitrary :: Gen a

T is an instance of Arbitrary if there is a generator of Ts!

Now we could define

instance Arbitrary a => Arbitrary [a] where
  arbitrary = listGen arbitrary

and use it like so:

>>> sample' (arbitrary :: Gen [Int])
[[],[-2,-4,-2,-4],[3,5,6,-6,3,-4],[6,-5,-3],...]

Many standard types (including lists) already have Arbitrary instances:

>>> sample' (arbitrary :: Gen Int)
[0,-2,-2,0,-1,8,1,-14,-13,5,19]

>>> sample' (arbitrary :: Gen (Int, Bool))
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> sample' (arbitrary :: Gen String)
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]

Outline

Property-based Testing [done]
Random Test Generation [done]
- Gen a is a monadic generator of a values
- Arbitrary is a class for types with generators
Case-Study

Case Study: Compiler Optimizations

Lets use QC to test compiler optimizations

Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)

Using the WHILE language from your HW assignment.

WHILE: Syntax

Recall the definition of Variable and Value

data Variable = V String 

data Value = IntVal Int | BoolVal Bool

which we used to define Expression

data Expression = Var   Variable | Val   Value
                | Plus  Expression Expression
                | Lt    Expression Expression

and Statement

data Statement
  = Assign   Variable   Expression
  | If       Expression Statement  Statement
  | While    Expression Statement
  | Sequence Statement  Statement
  | Skip

WHILE: Semantics

Next, we defined the behavior of programs as

functions from starting store to final store
where store was defined as a map from Variable to Value

type Store = M.Map Variable Value

and then you wrote a function

execute ::  Store -> Statement -> Store
execute s0 stmt = execState (evalS stmt) s0

(We can skip the details of evalS because you wrote it… right?)

Generating WHILE Programs

Lets write a program generator

Generating Variables

instance Arbitrary Variable where
  arbitrary = do
    x <- elements ['A'..'Z'] 
    return (V [x])

and we get

>>> sample' (arbitrary :: Gen Variable)
[C,J,Y,K,Q,T,Q,K,P,B,W]

Generating Values

instance Arbitrary Value where
  arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
                    , do {b <- arbitrary; return (BoolVal b)} 
                    ]

and we get

>>> sample' (arbitrary :: Gen Value)
[False,2,True,False,True,-4,True,-12,False,10,True]

Aside: fmap

This is a very common pattern:

do {x <- step; return (f x) }

Luckily, every Monad is also a Functor, and therefore defines:

fmap :: (a -> b) -> m a -> m b

So we can write the above as:

f `fmap` step

or with <$>, the infix synonym for fmap:

f <$> step

EXERCISE: Generating Expressions

Implement an Arbitrary instance for Expression

Note: expressions need not be well-typed!

instance Arbitrary Expression where
  arbitrary = expr

expr :: Gen Expression
expr = ???

expr, binary, atom :: Gen Expression
expr     = oneof [atom, binary] 

binary  = do o  <- elements [Plus, Lt]
             e1 <- expr
             e2 <- expr 
             return (o e1 e2)

atom = oneof [ Var <$> arbitrary, Val <$> arbitrary ]

which gives us

>>> sample (arbitrary :: Gen Expression)
((((0 < 0 + False) < (E < False)) < 0) < V)
-1 + I + True
True + False + 2
-1
False
(J < True + True) + (V + True + A < D) + P + S + -9 + K + U + 2 + R + C
X
...

Generating States

QC already has an way to automatically generate Maps

instance (Ord k, Arbitrary k, Arbitrary v) =>  Arbitrary (M.Map k v) where
  arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }

So for free we get a generator for Store

>>> sample (arbitrary :: Gen Store)
...

Specification: Program Equivalence

We are interested in the following property:

(=~) ::  Statement -> Statement -> Property
p1 =~ p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)

That is, for all input stores st

executing p1 from st and
executing p2 from st

produce the same store

QUIZ

-- X := 10; Y := X
prog1 = Sequence 
  (Assign (V "X") (Val (IntVal 10)))
  (Assign (V "Y") (Var (V "X")))

-- Y := 10; X := Y
prog2 = Sequence 
  (Assign (V "Y") (Val (IntVal 10)))
  (Assign (V "X") (Var (V "Y")))

What is the result of the following query?

>>> quickCheck (prog1 === prog2)

A. PASS

B. FAIL

QUIZ

-- X := Y; Y := X
prog3 = Sequence 
  (Assign (V "X") (Var (V "Y")))
  (Assign (V "Y") (Var (V "X")))

-- Y := X; X := Y
prog4 = Sequence 
  (Assign (V "Y") (Var (V "X")))
  (Assign (V "X") (Var (V "Y")))

What is the result of the following query?

>>> quickCheck (prog3 === prog4)

A. PASS

B. FAIL

Checking Compiler Optimizations

A compiler optimization can be viewed as a pair of programs

input written by human p_in
output optimized by compiler p_out

An optimization is correct if p_in =~ p_out

Compiler should not change behavior of the code

Optimization 1: Zero-Add-Elimination

Here’s a simple optimization

In	Out
`X := E + 0`	`X := E`

Should be correct because adding 0 doesn’t change anything…

We can write this as a QC property

prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e = 
   (x `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e)

So what does QC say?

-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []

What’s that? Lets see is W := True equivalent to W := True + 0 ?

Forgot about those pesky boolean expressions!

If you recall W := True will just assign True to W

>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
fromList [(W,True)]

but W := True + 0 will have a “type error” so will assign 0 to W!

>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
fromList [(W,0)]

Fix: Restrict to Int Expressions

Problem was expressions like True

Caused strange behaviors due to “type errors”

Lets avoid generating expressions with True, False, and Lt:

exprI, binaryI, atomI :: Gen Expression
exprI     = oneof [atomI, binaryI] 

binaryI  = do e1 <- exprI
              e2 <- exprI 
              return (Plus e1 e2)

atomI = oneof [ Var <$> arbitrary, Val . IntVal <$> arbitrary ]

Now we can restrict the property to

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) =~ (x `Assign` e))

QUIZ

Consider the property

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) 
                        =~ 
                        (x `Assign` e))

What will be the result of

>>> quickCheck prop_add_zero_elim'

A. PASS

B. FAIL

>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]

Oops, the counterexample is Z := G and Z := G + 0

but now the starting state maps (G, True)

Pesky Bool sneaked right back in …

Moral: Even simple optimizations are really tricky without types!

Try at home Can you fix this property so it passes?

Optimization 2: Constant Folding

Lets try another optimization that doesn’t use any arithmetic

So should not have any problems with Int-vs-Bool

Suppose you have two back-to-back assignments

X := E;   Y := E

Why recompute E? Result is already stored in X! So optimize the above to

X := E;   Y := X

We can specify this transformation as the QC property

prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e = 
  ((x `Assign` e) `Sequence` (y `Assign` e))
  =~
  ((x `Assign` e) `Sequence` (y `Assign` Var x))

Mighty QC, do you agree ?

>>> quickCheck prop_const_prop 
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]

Shrinking Tests

The property fails … but the counterexample is very long!

QC has a test shrinking mechanism … also in the Arbitrary class

shrink :: a -> [a]

shrink t

takes as input a candidate t and
returns as output a list of smaller candidates

That QC will systematically search with to find a minimally failing test!

instance Arbitrary Expression where
  arbitrary = expr

  -- shrink :: Expression -> [Expression]
  shrink (Plus e1 e2)  = [e1, e2]
  shrink (Minus e1 e2) = [e1, e2]
  shrink _             = []

We should just keep shrinking each Expression to its sub-Expressions.

>>> quickCheck prop_const_prop 
*** Failed! Falsifiable (after 26 tests and 4 shrinks):    
D
U
A + D
fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]

Aha! Consider the two programs

D := A + D;   
U := A + D

and

D := A + D; 
U := D

are they equivalent? Pretty subtle, eh.

Recap: Property-Based Testing

Property-based Testing
- Properties are boolean functions
- Can be conditional, but be careful to not discard too many inputs
- Can compare to a reference implementation
Test Generation
- Gen a is a monadic generator of a values
- Arbitrary is a class for types with generators
Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers, databases, and more!

Recap: Property-Based Testing

QuickCheck is awesome!

Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.

Lots of literature on QC and techniques it has inspired

Can even use QC to generate test data systems in other languages