Algebraic Data Types

Recap: Haskell Crash Course

  • Core program element is an expression
  • Every valid expression has a type (determined at compile-time)
  • Every valid expression reduces to a value (computed at run-time)











Representing complex data

We’ve seen:

  • base types: Bool, Int, Integer, Float

  • some ways to build up types: given types T1, T2

    • functions: T1 -> T2
    • tuples: (T1, T2)
    • lists: [T1]



Algebraic Data Types: a single, powerful technique for building up types to represent complex data

  • lets you define your own data types
  • subsumes tuples and lists!







Algebraic Data Types

  1. Type Synonyms: naming existing types

  2. Product Types: bundling things together

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Type Synonyms

Synonyms are just names (“aliases”) for existing types

  • think typedef in C










A type to represent circles

A tuple (x, y, r) is a circle with center at (x, y) and radius r

type CircleT = (Double, Double, Double)











A type to represent cuboids

A tuple (length, depth, height) is a cuboid

type CuboidT = (Double, Double, Double)










Using Type Synonyms

We can now use synonyms by creating values of the given types

circ0 :: CircleT 
circ0 = (0, 0, 100)  -- circle at "origin" with radius 100

cub0 :: CuboidT
cub0 = (10, 20, 30)  -- cuboid with l=10, d=20, h=30 

And we can write functions over synonyms too

area :: CircleT -> Double
area (x, y, r) = pi * r * r  

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h 

We should get this behavior

>>> area circ0 
31415.926535897932

>>> volume cub0
6000










QUIZ

Suppose we have the definitions

type CircleT = (Double, Double, Double)
type CuboidT = (Double, Double, Double)

circ0 :: CircleT
circ0 = (0, 0, 100)  -- circle at "origin" with radius 100

cub0 :: CuboidT
cub0 = (10, 20, 30)  -- cuboid with length=10, depth=20, height=30 

area :: CircleT -> Double
area (x, y, r) = pi * r * r  

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h

What is the result of

>>> volume circ0

A. 0

B. Type error

Answer: A










Beware!

Type Synonyms

  • Do not create new types

  • Just name existing types

And hence, synonyms

  • Do not prevent confusing different values










Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Creating New Data Types

We can avoid mixing up values by creating new data types

-- | A new type `Circle` with constructor `MkCircle`,
--   which takes three arguments of type `Double`
data Circle = MkCircle Double Double Double

-- | A new type `Cuboid` with constructor `MkCuboid`
--   which takes three arguments of type `Double`
data Cuboid = MkCuboid Double Double Double

We use constructors to build values of the new type:

circ1 :: Circle 
circ1 = MkCircle 0 0 100  -- circle at "origin" w/ radius 100

cub1 :: Cuboid
cub1 = MkCuboid 10 20 30  -- cuboid w/ len=10, dep=20, ht=30 










QUIZ

Suppose we create a new type with a data definition

-- | A new type `Circle` with constructor `MkCircle`
data Circle = MkCircle Double Double Double

What is the type of the MkCircle constructor?

A. MkCircle :: Circle

B. MkCircle :: Double -> Circle

C. MkCircle :: Double -> Double -> Circle

D. MkCircle :: Double -> Double -> Double -> Circle

E. MkCircle :: (Double, Double, Double) -> Circle

Answer: D









Aside: Record syntax

Haskell’s record syntax allows you to name the constructor parameters:

  • Instead of

    data Circle = MkCircle Double Double Double
  • you can write:

    data Circle = MkCircle { 
      center_x :: Double, 
      center_y :: Double,
      radius   :: Double  
    }
  • then you can do:

    circ1 = MkCircle { center_x = 0, center_y = 0, radius = 100 }
    -- same as: circ1 = MkCircle 0 0 100
    
    r = radius circ1 -- use field name as a function    










QUIZ

Suppose we have the definitions

type CuboidT = (Double, Double, Double)
data Cuboid  = MkCuboid Double Double Double     

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h

What is the result of

>>> volume (MkCuboid 10 20 30)

A. 6000

B. Type error

Answer: B










Deconstructing Data

Constructors let us build values of new type … but how to use those values?

How can we implement a function

volume :: Cuboid -> Double
volume c = ???

such that

>>> volume (MkCuboid 10 20 30)
6000










Deconstructing Data by Pattern Matching

Haskell lets us deconstruct data via pattern-matching

volume :: Cuboid -> Double
volume (MkCuboid l d h) = l * d * h

area :: Circle -> Double
area (MkCircle x y r) = pi * r * r

same as for tuples, just using different constructors!










Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Defining new types with data prevents us from mixing up values:

area :: Circle -> Double
area (MkCircle x y r) = pi * r * r

>>> area (MkCuboid 10 20 30) -- TYPE ERROR!

But … what if we need to mix up values?









A List of Shapes

Suppose I need to represent a list of shapes

  • Some Circles
  • Some Cuboids

What is the problem with shapes as defined below?

shapes = [circ1, cub1]

Where we have defined

circ1 :: Circle 
circ1 = MkCircle 0 0 100  -- circle at "origin" with radius 100

cub1 :: Cuboid
cub1 = MkCuboid 10 20 30  -- cuboid with length=10, depth=20, height=30 










This does not type-check!

All list elements must have the same type!





Solution?










Sum types

Lets create a single type that can represent both kinds of shapes!

data Shape 
  = MkCircle Double Double Double   -- Circle at x, y with radius r 
  | MkCuboid Double Double Double   -- Cuboid with length, depth, height

A sum type is a data type with multiple constructors

  • aka enum in Rust
  • aka (tagged) union in C










QUIZ

With the definition:

data Shape 
  = MkCircle Double Double Double   -- Circle at x, y with radius r 
  | MkCuboid Double Double Double   -- Cuboid with length, depth, height

What is the type of MkCircle 0 0 100 ?

A. Shape

B. Circle

C. MkCircle

D. (Double, Double, Double)

Answer: A













List of Shapes: Take 2

Now we can define

circ2 :: Shape
circ2 = MkCircle 0 0 100  -- circle at "origin" with radius 100

cub2 :: Shape 
cub2 = MkCuboid 10 20 30  -- cuboid with length=10, depth=20, height=30 

and then define collections of Shapes

shapes :: [Shape]
shapes = [circ2, cub2]













2D Shapes

Lets define a type for 2D shapes

data Shape2D 
  = MkRect Double Double -- rectangle with width and height
  | MkCirc Double        -- circle with radius
  | MkPoly [Vertex]      -- polygon with a list of vertices

type Vertex = (Double, Double)

Different constructors can have different types!











Tagged Boxes

A values of type Shape2D is either two doubles tagged with Rectangle

>>> :type (Rectangle 4.5 1.2)
(Rectangle 4.5 1.2) :: Shape

or a single double tagged with Circle

>>> :type (Circle 3.2)
(Circle 3.2) :: Shape

or a list of pairs of d tagged with Polygon

ghci> :type (Polygon [(1, 1), (2, 2), (3, 3)])
(Polygon [(1, 1), (2, 2), (3, 3)]) :: Shape




Data values are Tagged Boxes





How do we get the value out of the box?













EXERCISE

Write a function to compute the perimeter of a Shape2D

data Shape2D = MkRect Double Double | MkCirc Double | MkPoly [Vertex]
type Vertex = (Double, Double)

perimeter :: Shape2D -> Double
perimeter s = ???

HINT

You may want to use this helper that computes the distance between two points

distance :: Vertex -> Vertex -> Double
distance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ** 2 + (y2 - y1) ** 2)







Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants [done]

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters











Recursive Data Types

We can define datatypes recursively too

data IntList 
  = INil                -- empty list
  | ICons Int IntList   -- list with "hd" Int and "tl" IntList











QUIZ

data IntList 
  = INil                -- empty list
  | ICons Int IntList   -- list with "hd" Int and "tl" IntList

What is the type of ICons ?

A. IntList

B. Int -> IntList

C. (Int, IntList)

D. Int -> IntList -> IntList

E. IntList -> IntList

Answer: D











EXERCISE

With the definition:

data IntList = INil | ICons Int IntList

Write down an IntList representation of the list [1,2,3]

list_1_2_3 :: IntList
list_1_2_3 = ???











Recursion means boxes within boxes

Recursively Nested Boxes











Recursive functions over recursive data

Lets rewrite the count0 function from last lecture to work on IntLists:

count0 :: IntList -> Int
count0 xs = ???










count0 :: IntList -> Int
count0 INil         = 0
count0 (ICons 0 xs) = 1 + count0 xs
count0 (ICons _ xs) = count0 xs










Trees

Lists are unary trees:

Unary tree (aka list)




How do we represent binary trees?

Binary tree










QUIZ: Binary trees

How do you represent this binary tree as a recursive datatype?

Binary tree

(A) data ITree = ILeaf | INode Int ITree

(B) data ITree = ILeaf | INode ITree ITree

(C) data ITree = ILeaf | INode Int ITree ITree

(D) data ITree = ILeaf Int | INode ITree ITree

(E) data ITree = ILeaf Int | INode Int ITree ITree


Answer: C










Binary tree
-- | Binary tree of integers
data ITree = ILeaf | INode Int ITree ITree

t1234 = INode 1 
          (INode 2 (INode 3 ILeaf ILeaf) ILeaf) 
          (INode 4 ILeaf ILeaf)








Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants [done]

  4. Recursive Types: types that contain themselves [done]

  5. Polymorphic Datatypes: datatypes with parameters








Parameterized Types

Our IntList datatype can only store Ints :-(

What if we want to store Chars or Doubles?

data CharList 
  = CNil
  | CCons Char CharList

data DoubleList 
   = DNil
   | DCons Char DoubleList











Don’t Repeat Yourself!

Don’t repeat definitions - Instead reuse the list structure across all types!

Find abstract data patterns by

  • identifying the different parts and
  • refactor those into parameters








A Refactored List

Here are the three types: What is common? What is different?

data IList = INil | ICons Int    IList

data CList = CNil | CCons Char   CList

data DList = DNil | DCons Double DList





Common: Nil/Cons structure

Different: type of each “head” element





Refactored using Type Parameter

-- | A list of elements of type `a`
data List a = Nil | Cons a  (List a)

We can recover original types as instances of List:

iList :: List Int      -- list where 'a' = 'Int' 
iList = Cons 1 (Cons 2 (Cons 3 Nil))

cList :: List Char     -- list where 'a' = 'Char' 
cList = Cons 'a' (Cons 'b' (Cons 'c' Nil))

dList :: List Double   -- list where 'a' = 'Double' 
dList = Cons 1.1 (Cons 2.2 (Cons 3.3 Nil))















QUIZ

data List a = Nil | Cons a  (List a)

What is the type of Cons ?

A. Int -> List -> List

B. Int -> List Int -> List Int

C. Char -> List Char -> List Char

D. a -> List -> List

E. a -> List a -> List a

Answer: E











Polymorphic Data has Polymorphic Constructors

Look at the types of the constructors

>>> :type Nil 
Nil :: List a

That is, the Empty tag is a value of any kind of list, and

>>> :type Cons 
Cons :: a -> List a -> List a

Cons takes an a and a List a and returns a List a.















EXERCISE: Polymorphic Functions

Write an append function that appends two Lists:

data List a = Nil | Cons a  (List a)

append :: ??? -- What is the type of `append`?
append = ???

so that we can call:

-- Using built-in list syntax for readability:
>>> append [1.1, 2.2] [3.3, 4.4]
[1.1, 2.2, 3.3, 4.4]

>>> append "mmm, " "donuts!"
"mmm, donuts!"















Polymorphic Functions over Polymorphic Data

The append function on Lists is polymorphic:

append :: List a -> List a -> List a
append Nil ys         = ys
append (Cons x xs) ys = Cons x (append xs ys)

append doesn’t care about the actual values in the list

  • only manipulates the structure of the list

Hence append :: List a -> List a -> List a

  • we can call append on lists of any kind
  • as long as both lists are of the same kind
>>> append [1.1, 2.2] [3.3, 4.4]  -- a = Double
[1.1, 2.2, 3.3, 4.4]

>>> append "mmm, " "donuts!"      -- a = Char
"mmm, donuts!"

>>> append [1, 2] "donuts!"       -- a = ???
???

>>> append [[1], [1,2]] [[1,2,3]] -- a := ???
???















Built-in Lists?

This is exactly how Haskell’s “built-in” lists are defined:

data [a]    = [] | (:) a [a]

data List a = Nil | Cons a (List a)
  • Nil is called []
  • Cons is called :

Many list manipulating functions e.g. in Data.List are polymorphic - Can be reused across all kinds of lists.

(++) :: [a] -> [a] -> [a]
head :: [a] -> a
tail :: [a] -> [a]















Kinds

List a corresponds to lists of values of type a.

If a is the type parameter, then what is List?

A type-constructor that

  • takes as input a type a
  • returns as output the type List a

But wait, if List is a type-constructor then what is its “type”?

  • A kind is the “type” of a type constructor.
>>> :kind Int
Int :: *
>>> :kind Char
Char :: *
>>> :kind Bool
Bool :: *

Thus, List is a function from any “type” to any other “type”, and so

>>> :kind List
List :: * -> *















QUIZ

What is the kind of ->? That, is what does GHCi say if we type

>>> :kind (->) 

A. *

B. * -> *

C. * -> * -> *

Answer: C






If interested, see this for more information about kinds.